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DESCRIPTION 

RATIONS INPUT BUFFER ARRANGEMENTS FOR AUXILIARY 
INFORMATION IN VIDEO AND AUDIO SIGNAL PROCESSING SYSTEMS 

5 

Technical Field 

Tlie present invention relates to apparatus for 
compressing and e^axiding digital information signals, 
and/ in particular, to the buffering of auxiliary 
10 information included with information signals compressed 
with a dynamically varying ccn^ression ratio. 

Background Art 

For storage on or distribution via such media as 
15 CD-ROMs, laser disks (LDs), video tapes, magneto-optical 
(MO) storage media, digital compact cassette (DCC), 
terrestrial or satellite broadcasting, cable systems, 
fibre-optic distribution systems, telephone systems, ISDN 
systems etc • , video and audio signals are conrpressed and 
20 coded, and the resulting video stream and audio stream axe 
then multiplexed to provide a bit stream for feeding to 
the medium. The bit stream is later reproduced from the 
medium, is demultiplexed, and the resulting video stream 
and audio stream are decoded and expanded to recover the 
25 original audio and video signals. 
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Two of tlie main international standards related to 
con^ressing audio and video signals for storage on or 
distribution via a medium are those known as MPEG-1 and 
MPE6-2 . These standairds have been established by the 
5 Motion Picttire Experts Group (MPEG) €q>erating under the 

auspices of the International Standards Organization (ISO) 
and the International Electrotechnical Committee (lEC). 

The MPEG standards are established under the 
assumption that they will be used in a wide range of 

10 applications. As a result, the standards allow for such 

possibilities a phase-locXed system, in which the sampling 
rate clock of the audio signal is phase-locked to the same 
clock reference (8CR) as the frame rate clock of the video 
signal, and a non phase- locked system in vdxich the 

15 sampling rate clock of the audio system and the frame rate 
clock of the video system operate independently. 
Irrespective of whether the system is phase- locked, the 
MPEG standards require the addition of a time stamp to the 
multiplexed bit stream at least once every 0.7 seconds, 

20 and that the encoder provide separate time stasgps for use 
by the audio decoder axxd by the video decoder. 

One of the aims of the MPEG standards is to provide 
flexibility for encoder and decoder design while 
ensuring that the bit stream provided by any encoder can 

25 be successfully decoded by any decoder. One of the ways 
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in wMcli tliis compatibility is establiBhed is by the 
concept of the System Target Decoder. 

A typical audio and video sigzial processing system 110 
according to the MPE6-1 and HPE6-2 standards is shown in 
5 Figure 1. In this, the encoder 100 receives the video 
signal S2 from the video signal storage medium 2, and 
receives the audio signal S3 from the audio signal storage 
medium 3. The audio signal S3 could alternatively be (and 
is more usually) also received from the video signal 
10 storage mfM^ii™ 2 instead of from a separate audio storage 
medium. 

The encoder 100 compresses and codes the video and 
audio signals, and multiplexes the resulting audio stream 
and video stream to provide the multiplexed bit stream 

15 SlOO, which is fed for storage or distribution hy the 
medium 5. The ™^ ^ can be any Tnp5^ i suitable for 
storing or distributing a digital bit stream, for example, 
a CD-ROM, a laser disk (U)), a video tape, a 
magneto-optical (MO) storage miedium, a digital contact 

20 cassette (DCC), a terrestrial or satellite broadcasting 

system, a cable system, a fibre-optic distribution system, 
a telephone system, an ISDN system, etc. 

The encoder 100 congpresses and codes the video signal 
picture-by-picture. Each picture of the video signal is 

25 compressed in one of three compression modes. A picture 
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ccampressed in the intra-pictxire compression xnode is called 
an I-pictxire. In the intra-picture compression mode, the 
picture is compressed by itself without reference to other 
pictures of the video signal. Pictures conpressed in the 
5 inter-picture ccaqpression mode are called P-pictures or 
B-pictures. A P-picture is compressed using forward 
prediction coding using as a reference picture a previous 
I-picture or P-picture, i.e., a picture occurring earlier 
in the video signal. Each block of a B-picture may use as 

10 a reference block any one of the following: a block of a 
previous i-picture or P-pictxire, a block of a following 
P-picture or I-picture (i.e., a picttire occtirring later in 
the video signal), or a block obtained by performing 
linear processing on a block of a previous I-picture or 

15 P-pictxire and block of a following I-picture or P-picture. 
In addition, blocks of a B-pictxzre may be compressed in 
the intra-picture compression mode. Typically, about 150 
Kbits (Kb; 1 Kb = 1024 bits) of the video stream are 
required for an I- pictixre, 75 Kb of the video stream are 

20 required for a P-picture, and 5 Kb of the video stream are 
required for a B-picture. 

The digital video and audio processing system 110 also 
includes the decoder 600, which receives as its input 
sigxial the bit stream S5 from the medium 5. The decoder 

25 performs demultiplexing inverse to the multiplexing 
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performed in the encoder 100. The decoder also applies 
decoding and expansion to the resulting audio stream and 
video stream using processing complementary to that 
performed by the encoder 100 to provide the recovered 
5 video signal 6A and the recovered audio signal 6B. The 
recovered video signal SA and the recovered audio signal 
6B respectively closely match the video signal S2 and the 
audio signal S3 fed into the encoder 100 » 

Figure 1 also shows the system target decoder (STD) 

10 400 which is used to define the processing performed by 
the encoder 100 and the decoder 600. In practical video 
and audio sigxial processing systems, the encoder seldom 
includes an actual system target decoder, but instead 
performs the encoding processing and multiplexing taking 

15 account of the system target decoder parameters. Also, in 
practical systems, the decoder is designed to have 
performance equalling or exceeding that of the system 
target decoder. These relationships between the system 
- target decoder and the encoder and the decoder are 

20 indicated in Figure 1 by the broken line labelled S4A 

interconnecting the system target decoder and the encoder, 
and the broken line labelled S4B interconnecting the 
system tcucget decoder and the decoder. 

The system target decoder 400 is also known as a 

25 hypothetical system target decoder, system reference 
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decoder, or reference decoding processing system. From 
now on it will be referred to as a system target decoder. 

System target decoders are defined in international 
standard specifications such as CCXTT H.261 and tbe MPEO-1 
5 standard to provide guidelines for the designers of video 
and audio encoders and decoders for these standards. 

In the HPE6-1 system standard, the system target 
decoder includes a reference video decoder and a reference 
audio decoder. In addition, the system target decoder 

10 includes an input buffer for the reference video decoder 
and an input buffer for the reference audio decoder. The 
size of each input buffer is defined in the standard. The 
standard also defines the operation of the two reference 
decoders, especially with regard to the way in which they 

15 remove the audio stream and the video stream from their 
respective buffers. 

The concept of the system target decoder provides 
compatibility between encoders and decoders of different 
designs as follows. All encoders are designed to provide 

20 a bit stream that can be successfully decoded by the 
system target decoder, and that does not cause the 
respective input buffers in the system target decoder to 
overflow or underflow. In addition, all decoders are 
designed to have performance parameters that are egual to 

25 or better than those defined for the system target 



WOS430014 




PCT/JPM/OO^ 



7 

decoder. Ab a result, all such decoders will be capable 
of successfully decoding the bit stream produced by any of 
the encoders designed to produce a bit stream capable of 
being decoded by the system target decoder. The bit 
5 stream produced for decoding by the system target decoder 
is called a "constraint system parameter stream" . 

The structure of the hypothetical system target 
decoder 400 shown in Figure 1 is as follows. The 
demultiplexer 401 notionally receives the bit stream SlOO 

10 from the encoder 100. The demultiplexer 401 demultiplexes 
the bit stream into a video stream and an audio stream. 
The video stream is fed to the video input buffer 402, the 
output of which is connected to the video decoder 405. 
The audio stream from the demultiplexer 401 is fed into 

15 the audio input buffer 403, the output of which is 

connected to the audio decoder 406. in the example shown 
in Figure 1, the video input buffer 402 has a storage 
capacity of 46K bytes and the audio input buffer 403 has a 
storage capacity of 4R bytes, as specified by the MPE6-1 

20 sta ndar d. The video decoder 405 removes the video stream 
from the video input buffer 402 one video access unit at a 
time, i.e., one pictxire at time, at a timing corresponding 
to the picture rate of the video signal, e.g., once every 
1/29.94 seconds in an NTSC system. The amount of the 

25 video stream removed from the video input buffer for each 
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picture varies tecause of the different amount of 
compreBsion applied to each picttire. The audio decoder 
406 removes the audio stream from the audio input buffer 
403 one audio access unit at a predetermined timing. 
5 It is desirable from the standpoint of the 

construction of the system, and to maximize flexibility, 
that, in the real decoder 600, the element corresponding 
to the demultiplexer 401 in the STD include a switching 
circuit, and that the elements corresponding to the video 

10 decoder 405 and the audio decoder 406 in the STD be 

provided using a high-speed data processor (DSP) having a 
configuration suitable for performing high-speed signal 
processing operations. Such processors normally cannot 
include a large amount of storage for cost reasons. 

15 Therefore, the MPEG standards take these practical 

considerations into account and set the storage capacities 
of the video input buffer 402 and the audio input buffer 
403 to the relatively small values set forth above. 
Figiire 2 shows the structure of the constraint 

20 parameter (multiplex) system bit stream CPSP that is 

notionally fed into the system target decoder 400. The 
bit stream shown in Figure 2 has a multi-layer structure, 
and includes various headers in a multiplex layer and the 
audio stream and the video stream in a signal layer. In 

25 this structixre, plural packs serially arranged in time. 
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Each pack begins with a pack header, and includes at least 
one video packet and at least one audio packet. Each 
video packet begins with a packet header and includes the 
video stream of at least part of at least one picture. One 
5 video packet will accommodate the video stream of more 
than one B- picture # but several video packets are 
required to accommodate the video stream of one l-picture. 
There is no requirement that a picture begin Immediately 
after the packet header: the picture may start at any 

10 point in the video packet. 

Each video packet header may include at least ozie 
video time stamp showing the presentation time of the 
first picture that begixus in the packet. If the first 
picture is an l-picture or a P-picture, and its decoding 

15 time differs from its presentation time, a decoding time 
stas^ may also be included. The purpose and use of the 
video time stamps will be described below. 

Each audio packet includes at least one audio access 
unit of the audio stream, and begins with an audio packet 

20 header. The audio packet header may include a 

presentation time staiqp showing the output timing of the 
audio signal obtained by decoding the first audio access 
unit beginning in the audio packet. Each audio access 
unit is about 384 bytes in HPE6-1. 

25 Figure 2 shows a video packet that includes the video 
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Stream of the en d of tlie picture i, and the video stream of 
at least the beginning of the picture i+1. The video time 
stamp vts included in the video packet header shown is the 
video time stamp of the picture i+1, because the picture 
5 i+1 is the first pictxire that begins in the video packet. 
Figure 2 also shows the audio packet that includes the 
audio signal of the end of the access unit j , and the audio 
signal of the access units j+1 and j+2. The audio time 
staB^ ats included in the axidio packet header is the time 
10 stamp of the audio access unit j+l# because the access 

unit j+1 is the first access xinit that begins in the audio 
packet. 

The encoder 100 compresses and codes the video signal 
S2 and at least codes the audio signal S3 to provide a 

15 video stream and an audio stream, respectively, and 

multiplexes the audio stream, the video stream, and the 
various headers to provide the multiplexed bit stream SlOO 
having the format shown in Figure 2. The encoder feeds 
the multiplexed bit stream to the medium 5 for 

20 transmission or storage. The multiplexed bit stream is 
such that, if the encoder had fed the multiplexed bit 
stream to the system target decoder 400 for decoding, the 
system target decoder would have decoded the multiplexed 
bit stream successfully, and no overflow or imderf low 

25 would have occurred in either of the input buffers in the 
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system target decoder. 

Because of the requirement that the multiplexed bit 
stream SlOO be capable of being successfully decoded by 
the system target decoder 400, the encoder 100 applies a 
5 dynamically varying compression and coding processing to 
at least the video signal S2. The conqpression ratio of 
the compression applied by the encoder 100 varies with 
time. Moreover, since the amount of the video stream that 
can be used to represent a picture of the video signal S2 

10 depends on the occupancy of the video ix^ut buffer of the 
system target decoder at the instant that the picture is 
conipressed, the amount of conqpression applied to a given 
picture varies dynamically. The amount of the video 
stream derived from a given video sequence will differ if 

15 the given video sequence is processed on different 

occasions. Accordingly, the compression ratio of at least 
the video stream produced by the encoder 100 varies 
constantly. 

As shown above, the audio stream and the video stream 
20 are time multiplexed to provide the multiplexed bit stream 
SlOO. The audio stream of the audio signal belonging to a 
given picture of the video signal is located in the 
multiplexed bit stream some time earlier or later than the 
video stream of the picture. As a result of this, the 
25 decoder 600 must provide timing synchronization between 
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tlie recovered video signal produced by expanding the video 
stream, and the recovered audio signal produced lyy 
expanding the audio stream. To provide this 
synchronization, the HPE6 standard stipulates that the 
5 encoder add the above-mentioned time stamps to at least 
some o£ the video packet headers and the audio packet 
headers. The video time stasqps and the audio time stamps 
show timings prescribing the clocks to be used to perform 
synchronized decoding of the video stream and the audio 

10 stream. The video time stamps and the audio time stamps 

also show the times at ^ich units (i.e., pictures) of the 
recovered video signal and units of the recovered audio 
signal obtained by expanding respective access units of 
the video stream and the audio stream are to be presented 

15 at the decoder output. Such timing information is 

necessary to prevent audio/video synchronization errors 
from occurring if the decoder is unable to decode lost or 
corrupted audio or video access units. This will be 
described in more detail below. 

20 Figure 3 shows the structure of the decoder 600. In 

the decoder 600, the demultiplexer 601 receives the 
multiplexed bit stream from the mediim 5. The 
demultiplexer 601 demultiplexes the multiplexed bit stream 
into the video stream, the video time stamps, the audio 

25 stream, and the audio time staiqps. The video time staxqps 
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tlie audio time stamps ore respectively fed to the 
picture rate control circuit 698 and the sampling rate 
control circuit 699 for use in decoding the video stream 
f»Tif> the audio stream, respectively. The video stream from 
5 the output of the demultiplexer 601 is fed into the video 
input buffer 602, which precedes the video decoder 605. 
The audio stream from the demultiplexer is fed into the 
audio input buffer 603, which precedes the audio decoder 
606. 

10 The video decoder 605 removes each access unit of the 

video stream from the video input buffer 602 for decoding 
in the order in which the access unit was received by the 
video input buffer. The video decoder 605 decodes the 
video stream removed from the video input buffer 602 in 

15 response to timing signals received from the picture rate 
control circxxit 698. The picture rate control circuit is, 
in turn, controlled by the time stamps fed from the 
demultiplexer 601. Similarly, the audio decoder 606 
-removes each access unit of the audio stream from the 

20 atidio input buffer 603 for decoding in the order in which 
the access unit was received by the audio input buffer. 
The audio decoder 606 decodes the audio stream removed 
from the audio input buffer 603 in response to timing 
signals received from the sao^ling rate control circuit 

25 699. The sampling rate controller is, in turn, controlled 
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by the audio time stamps fed from tlie demultiplexer 601. 

The video input buffer 602 and the audio input buffer 
603 will be described in detail next. The elementary 
streams entering the decoders must be buffered for tbe 
5 following reasons. The first reason is that, as mentioned 
above, the compression ratios constantly change. The 
second reason is that the average transfer rate of the 
elementary streams from the medium 5 differs from the 
average input rate of the elementary streams to its 

10 respective decoder, depending on clock error. The third 
reason is that the decoders normally receive access units 
of their respective streams intermittently, so that the 
instantaneous transfer rate of the element eury stream in 
the multiplexed bit stream S5 from the medium 5 and the 

15 instantaneous input rate of the elementary stream to its 
respective decoder do not match. Therefore, the input 
buffers 602 and 603 are provided between the demultiplexer 
601 and the video decoder 605 and the audio decoder 606, 
respectively, to adjust the differences in the average 

20 transfer rate and the average input rate, and in the 

instantaneous transfer rate and the instantaneous input 
rate. 

Figures 4B to 4D are graphs of bit i n d ex curves 
showing the time dependency of the transfer of the audio 
25 stream in the multiplexed signal from the medium 5 into 
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the audio input buffer 603 and the ix«>ut of the audio 
stream into the audio decoder 606 from the a\idio input 
buffer. The arrangement of the audio input buffer 603 and 
the audio decoder 606 is shown in Figure 4A. 
5 The bit iT^<?ey curves show the relationship between 

the total mmibeT of bits (shown on the y-axis) that pass a 
given point in the circuit at the time indicated on the 
X-axis. 

Figure 4B shows the average bit index at the point lA 
10 at the input of the audio input buffer 603, which reflects 
the average rate at which the audio stream is transferred 
from the medium. The curve shows that the average 
transfer rate of the audio stream from the medium is more 
or less constant. However, the curve is not a straight 
15 line because the transfer rate varies with time due to 
clock drift. 

Figure 4C shows the actual bit index at the point IV 
at the ix^>ut to the input buffer 607. No bits are fed into 
the audio input buffer at first, because the multiplexer 

20 is feeding the video stream into the video buffer. Then, 
the demultiplexer 601 encounters the first audio packet in 
the multiplexed bit stream, and feeds the audio access 
units contained therein into the audio input buffer 603. 
Following the first audio packet, the demultiplexer ceases 

25 transfer of the audio stream into the audio input buffer 
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during the time it feeds the contents o£ the next video 
packet (s) into the video input buffer. Then, the 
demultiplexer 601 encounters another audio packet in the 
sniltiplexed bit stream and feeds the audio access units 
5 contained therein into the audio input buffer. This 
process is repeated throughout the decoding process. 

Figure 4D shows the bit index at the point OA at the 
output of the audio input buffer 603 as the audio stream 
is removed from the audio input buffer by the audio 

10 decoder 606. The audio decoder removes the audio stream 
from the audio input buffer one access unit at a time. 
Removal of the access unit takes place instantaneously, 
once every 24 ms, for exas^le. 

When each picture of the video signal is convressed 

15 and subject to variable length coding in the encoder 100, 
the amount of video stream produced changes significantly 
from picture-to-picture, depending on the mode in which 
the video signal of the picture was compressed, as 
described above. Accordingly, the input rate at which the 

20 video decoder 605 removes the video stream from the video 
input buffer 602 also changes significantly from picture 
to picture. As a result, the storage capacity of the 
video input buffer 602 is required to be considerably 
larger t h a n the storage capacity of the audio input buffer 

25 603. For exaizple, the MPE6-1 standard requires that the 
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size, i.e., the storage capacity, of the video im)ut 
buffer 602 be 46K bytes, whereas the standard sets the 
size of the audio input buffer at only 4K bytes. 

Figures 5A to 5D include three bit index curves 
5 showing the tiioe dependency of the transfer of the video 
stream in the multiplexed signal from the Tne d lum 5 into 
the video input buffer 602 and the input of the video 
stream into the video decoder 605 from the video input 
buffer. The arrangement of the video input buffer 602 and 

10 the video decoder 605 is shown in Figure 5A. 

Figure 5B shows the average bit ind e x at the point XV 
at the input of the video input buffer 602, which reflects 
the average rate at which the video stream is transferred 
from the medium. The curve shows that the average 

15 transfer rate of the video stream from the mediiTm is more 
or less constant. However, the curve is not a straight 
line because the transfer rate varies gradually with time 
due to clock drift. 

Figure 5C shows the actual bit index at the point XV 

20 at the input to the video input buffer 602. The video 
stream is first fed into the video input buffer at a 
substantially constant rate until the demultiplexer 601 
encounters the first audio packet in the multiplexed bit 
stream. The multiplexer interrupts feeding the video 

25 stream into the video input buffer while it feeds the 
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contents of the audio packet into the audio input buffer 
603. During this interruption, the bit index remains 
unchanged. At the end of the first audio packet, the 
demultiplexer 601 demultiplexes the video packet header of 
5 the following video packet, and then resxxmes transferring 
the video stream into the video input buffer until it 
encounters another audio packet in the multiplexed bit 
stream. This process is repeated throughout the decoding 
process • 

10 Figure 5D shows the bit index at the point OV at the 

output of the video input buffer 602 as the video stream 
is removed from the video input buffer by the video 
decoder 605. The video decoder removes the video stream 
from the video input buffer one access unit, i.e., one 

15 picture, at a time. Removal of the access unit takes 

place instantaneously, once every picture period, e.g., 
once BVBxy 33.4 ms in an NTSC system. The amount of the 
video stream removed each time depends on the mode in 
which the picttire was compressed by the encoder. Figure 

20 5D shows an exan^le in which a sequence of B-pictures is 
followed by an l-picture, which is followed by a sequence 
of B-pictures. It can be seen that a much greater amount 
of video stream is removed from the video input buffer for 
one l-picture than for one B-picture. 

25 Figures 6A and 6B show the buffering provided by the 
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video input buffer 602 or the audio input buffer 603. In 
these Figures/ the video input buffer 602 is used as an 
exan^le. The figures are both bit index curves. Figure 
6A shows ideal buffering, in which the video imnit buffer 
5 602 is used sijoply to accommodate the differences between 
th e transfer rate of the video stream from the ne dlu m and 
the input rate of the video steam to the video decoder 
605. The video stream is fed into the video input buffer 
602 from the multiplexer 601 at a substantially constant 

10 transfer rate, as indicated by the straight line marked IS 
in Figure 6A. The video decoder removes the video stream 
from the video input buffer one access unit, i.e., one 
picture, at a time, as shown. The amount of video stream 
removed for any one picture can vary from about 150 Kbits 

15 for an l-picture to about 5 Kbits for a B-picture. Thus, 
the video stream bit index at the output of the video 
input buffer changes in steps, the step size of which 
depends on the nuoiber of bits used to encode each picture, 
as indicated by the stepped curve marked OS. 

20 In the ideal buffering illustrated in Figure 6A, both 

of the following conditions are met at all times: 

(a) the difference between the amount of the video 
stream transferred into the video input buffer 602 from 
the medium and the storage capacity of the video input 

25 buffer 602 (indicated by the broken line SC), does not 
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exceed t:lie amount of tlie video stream removed from the 
video input buffer by the video decoder, i.e., there is no 
overflow; and 

(b) the amount of the video stream removed from the 
5 video input buffer 602 by the video decoder 605 does not 

exceed the amount of the video stream transferred into the 
video input buffer from the medium, i.e., there is no 
underflow. 

However, as illustrated in Figure 6B an overflow or an 
10 underflow can sometimes occur in buffering. In Figure 6B 
the transfer rate at which the video stream is received 
from the medium 5 varies with time. The video stream is 
otherwise similar to that shown in Figure 6A. Initially, 
the video input buffer 602 receives an excess amount of 
15 video stream compeured with that required by the video 

decoder 605, with the result that the video input buffer 
overflows at the point indicated by the letter A. Later, 
the transfer rate of the video stream received by the 
• video input buffer falls below the demand of the video 
20 decoder for the video stream, with the result that the 
video input buffer underflows at point indicated by the 
letter B. 

By controlling various ones of the parameters 
involved, input buffer overflow or underflow can be 
25 prevented. Some ways of preventing overflow or underflow 
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are illustrated in tlie bit index curves shown in Figures 
7 A through 7C. 

The first method illustrated in Figure 7A is called 
the rnefllinn slave method. In this method, the amount of 
5 the video stream transferred from the Tnedixim 5 to the 
video input buffer 602 is controlled to prevent an 
overflow or tinder flow from occurring. Without such 
control, the transfer rate is indicated by the curve LI. 
With control, the transfer rate is that indicated by the 

10 curve LI'. !rhe amount of the video stream transferred 
from the mediinn is controlled so that the following two 
conditions are satisfied: 

(a) the difference between the amount of the video 
stream (indicated by curve LI') transferred into the video 

15 input buffer 602 from the medium and the storage capacity 
of the video input buffer does not exceed the amount of 
the video stream (indicated by the curve L3) removed from 
the video input buffer by the video decoder 605, i.e., 
there is no overflow; and 

20 (b) the amount of the video stream (indicated by the 

curve L3) removed by the video decoder 605 from the video 
input buffer 602 does not exceed the amount of the video 
stream (indicated by the curve LI') transferred into the 
video input buffer 602, i.e., there is no underflow. 

25 The curve L2 shows how controlling the amount of the 
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video stream tranBferred into the video input buffer 602 
from the medium controls the difference between the amount 
of the video stream transferred into the video input 
buffer and the storage capacity of the video input buffer. 
5 The curve L2' shows this difference when the amount of the 
video stream transferred into the video input buffer from 
the medium is not controlled. 

The second method illustrated in Figure 7B is called 
the decoder slave method, in this method, the picture 
10 rate of the video decoder is controlled to change the 

amount of the video stream removed from the video input 
buffer by the video decoder. The picture rate is 
controlled such that the following two conditions are both 
met: 

15 (a) the amount of video stream (indicated by the ciirve 

L2), ¥^ch is the difference between the amount of the 
video stream (indicated by the curve LI) fed into the 
video input buffer 602 and the storage capacity of the 
• video input buffer, does not exceed the amount of the 

20 video stream (indicated by the curve hi') removed from the 
video input buffer by the video decoder 605, i.e. there is 
no overflow; and 

(b) the amount of the video stream (indicated by the 
curve L3 ' ) removed from the video input buffer by the 

25 video decoder does not exceed the amount of the video 
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stream (Indicated by the curve LI) transferred into the 
video input buffer 602 from the medium, i.e., there is no 
underflow. 

The actual amounts of the video stream removed from 
5 the video input buffer by the video decoder are indicated 
Toy the curve L3 ' • 

The above explanation is made with reference to the 
video stream, but similar results can be obtained for the 
audio stream by changing the sampling rate of the audio 
10 decoder 606 to adjust the rate at which the audio stream 
is removed from the audio input buffer 603. 

The third method illustrated in Figure 7C adjusts the 
amount of the video stream removed from the video input 
buffer 603 by the video decoder 605. For example, the 
15 method may cause the video decoder to skip decoding 
portions of the video stream or to repeat decoding 
portions of the video stream to adjust the amount of the 
video stream removed from the video input buffer. 

The curve L3' shows the changes in the amount of the 
20 video stream removed from the video input buffer 602. To 
prevent an overflow from occurring early in the sequence, 
the amount of the video stream removed from the video 
input buffer is increased by removing some video access 
units from the video input buffer but not decoding them. 
25 Later, to prevent an underflow, the amount of the video 
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stream removed from the input buffer Is reduced by 
removing some video access units from the video input 
buffer and decoding them more than once » This provides 
additional pictures without removing video access units 
5 from the video input buffer. 

Changing the picture rate of the video decoder, the 
sampling rate of the audio decoder, or the transfer rate 
of the multiplexed bit stream from the medium 5, as just 
described, causes undesirable side effects on the systems 

10 external to the video and axidio signal processing system 
110. Therefore, the changes just described cannot be made 
freely, and may only be made within a limited range. 
Consequently, it is desirable to control the multiplexed 
bit stream produced by the encoder so that the buffering 

15 requirements in the decoder can be met comfortably without 
having to resort to the correction methods just described. 

BSalfunctions in the buffering process are most likely 
to occur at the start of decoding. An underflow will 
result if the decoder attempts to remove an access unit of 

20 the stream from the input buffer before the whole of that 
access has been transferred into the input buffer from the 
medium. To prevent this, the decoding processing is 
started only after certain delay time has elapsed after 
transfer of the bit stream from the medixnn has begun. 

25 This allows the audio stream and the video stream to 
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accumulate in the respective audio and video input buffers 
before the respective decoders start removing units of the 
audio stream and the video stream for decoding « 

Figtxres 8A through 8D show some effects of a startup 
5 delay on buffering. Figure 8A shows ideal buffering, 

similar to that shown in Figure 6A. Figure SB shows the 
beneficial effect of a suitable startup delay when the 
multiplexed bit stream is transferred from the medium at a 
varying transfer rate. In Figure SB, the startup delay 

10 allows additional video stream to accumulate in the video 
input buffer 602 before the video decoder 605 starts to 
remove access units of the video stream from the video 
input buffer. 

Care must be exercised in determining the optimum 

15 startup delay. Figxire 8C shows the effect of an 

excessively long startup delay. In Figure 8C, the video 
decoder 605 waits too long before it starts to remove the 
video stream from the video input buffer 602. As a 
result, an overflow occurs at point C. Figure 8D shows 

20 the effect of a startup delay that is too short. The short 
startup delay does not allow sufficient video stream to 
accumulate in the video input buffer before the video 
decoder starts to remove the video stream from the video 
input buffer for decoding. As a result, insufficient 

25 video stream has accTzmulated in the video input buffer 
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wlien t:lie video decoder tries to remove the video stream of 
the first I-picture 12, and an underflow occurs at point 
D. Figure 8D also shows that, with a suitable start-up 
delay, the video stream of the first I-picture 12 can be 
5 removed without causing an underflow. 

Figure 9 illustrates in detail how the multiplexed bit 
stream transferred from the medium 5 is processed by the 
demultiplexer 601, the video input buffer 602, and the 
video decoder 605 to decode the video stream in the 

10 multiplexed bit stream. The circuit arrangement of the 
multiplexer 601, the input buffer 603, and the video 
decoder 605 is shown at the top of the drawing. 

An example of a portion of the multiplexed bit stream 
is shown at the left side of the drawing. The x>ortlon of 

15 the demultiplexed bit stream includes all of the x>ack n, 
and the beginning part of the pack n-i-1. Bach pack begins 
with the pack header, which includes the clock reference 
SCR, which shows the decoding timing of the pack. 

The pack n begins with the pack header (Pack Header 

20 n) , and contains the video packet m, which, in txim, 

contains the video stream for the pictures i. and i+l. The 
video packet m begins with the video packet header 
(V. Packet H), which inclxides the presentation time stang^ 
PTSm and the decoding time stamp DTSm. 

25 The pack n+1 follows the pack n, and includes the 
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pack header (Pack Head n+1), which includes the clock 
reference SCRn+1. Following the pack header are the video 
packets m+1 aixd m+2t and possibly more video packets. 
Each of the video packets m+1 and m+2 includes a packet 
5 header including a decoding time stai^p DTS# and the video 
stream of one picture. 

Figure 9 also shows the bit index curves for the input 
(marked XV) and the output (marked OV) of the video input 
buffer 602 . Various events in the multiplexed bit stream 

10 are linked to the bit Index curves with broken lines, and 
are also shown on the x-axis of the bit index curve. The 
bit index curve XV represents the bit index of the video 
stream transferred to the video input buffer 602 from the 
medium 5 via the demultiplexer 601. The bit index curve 

15 OV represents the bit index of the video stream removed 
from the video input buffer by the video decoder 605. 

The multiplexed bit stream is processed as follows: 
at the timing indicated by the clock reference SCRn in the 
-pack header of the pack n, the video stream contained in 

20 the pack n, i.e., the video stream of the pictures i. and 

i+1, is transferred via the demultiplexer 601 to the video 
input buffer 602. Then, at the timing indicated by the 
clock reference SCRn-fl the video stream contained in the 
pack n+1 is transferred into the video input buffer 602 

25 via the demultiplexer 601. The time stands in the video 
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packet headers are stored else^ere. 

Later, at the time indicated by the decoding time 
stas^ DTSm in the header of the video packet m, the video 
stream of the picture m is instantaneously removed from 
5 the video input buffer 602 by the video decoder 605. 

Then, one picture period later, the video stream of the 
picture i-i-1, which was also included in the video packet 
m, is removed from the video inxmt buffer by the video 
decoder. Later, at the timing indicated by the decoding 

10 time stamp DTSlm-i-l included in the i>acket header of the 
video packet m+1, the video stream of the picture i+2, 
which is the first picture beginning in the video packet 
m^>l, is removed from the video input buffer 602 by the 
video decoder 605. 

15 At the time indicated by the decoding time stamp 

DTSm-i-2 in the packet header of the video packet m+2, the 
video stream of the picture i+3, which is the first 
picture beginning in the video x>acket m-f2, is removed from 
the video input buffer 602 by the video decoder 605. 

20 Following removal of the video stream of the picture i+3, 
the video streams of the pictures whose video streams 
follow the video stream of the picture i+3 in the video 
packet i+3, are removed from the video input buffer 602 at 
times that are increments of one picture i>eriod later than 

25 the time indicated by the decoding time stamp DTSm+2 . 
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The timings Indicated by the time stamps may be stored 
as absolute timings using, for example, a crystal 
oscillator and a reference clock of 90 kHz. In this way it 
is possible to use the difference between the clock 
5 reference and the time staaqps as the start-up delay. 

As mentioned above, when a decoder according to the 
HPEO standard is used for decoding an audio stream and a 
video stream, it is necessary to synchronize the times at 
which units of the respective decoded signals resulting 

10 from decoding corresponding access units of the audio 

stream and the video stream are fed to the decoder output. 
The time at which a decoded signal unit is fed to the 
decoder output, is called the presentation time of that 
unit. The time stamps in the xmiltiplexed bit stream are 

15 used to provide this synchronization. 

Part of providing the necessary synchronization 
includes reozrdering the video signal resulting from 
decoding the video stream. This is illustrated in Figiire 
10. As mentioned above, the video stream includes the 

20 video streams of pictures that are compressed as 

l-pictures, as P-pictures, and as B-pictures. Of these 
pictures, the decoding time and the presentation time are 
only the same for B-pictures. Incidentally, the decoding 
time and the presentation time are also the same for the 

25 audio stream. I-pictures and P-pictures have a 
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presentation time that is later 1^ a number of picture 
periods than the decoding time. The video decoder 605 
removes the video stream of an I-picture or a P-picture 
from the video input buffer 602 at the time indicated by 
5 the decoding time staxnp DTS. After the video stream of a 
picttire has been decoded, the resulting decoded video 
signal is temporarily stored in the video decoder output 
buffer 611. Then, at the time indicated by a presentation 
time Btamp PTS, the video signal of the picture is fed 
10 from the video decoder output buffer to the output of the 
video decoder 605 to provide a picture of the video output 
signal. 

For exan^le, in Figure 10, the video stream of the 
I-picture 12 is removed from the video input buffer 602 at 

15 the time indicated by the display time staiqp DTSm for 

decoding, and the resulting video signal is stored in the 
output buffer 611 provided in the video decoder 605 for 
temx>orarily storing the video signals of decoded 
I-pictures and P-pictures. 

20 Then, the video decoder 605 consecutively removes the 

video streams of the B-pictures BO and Bl from the video 
input buffer 602, consecutively decodes them, and feeds 
the resulting video signals to its output one picture 
period apart. 

25 Next, the video decoder 605 removes the video stream 
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of tlie P-picture P5 from the video input buffer 602. The 
video decoder instantaneously decodes the video stream, 
and stores the resulting video signal in the output buffer 
611. Also, at the time indicated by the presentation time 
5 stamp FTS of the I*picttzre 12, which has the same value as 
the decoding time stamp of the P-picture P5, the video 
decoder feeds the video signal of the picture 12 to its 
output • 

Finally, in this example, the video decoder 605 

10 consecutively removes the video streams of the B-pictures 
B3 and 34 from the video input buffer 602, consecutively 
decodes them using the stored pictures 12 and P5 as 
reference picttires, and feeds the resulting video signals 
to its output one picture period apart. 

15 Since the video streams of I-pictures and.P-pictures 

differ in their decoding timing and their presentation 
timing, a presentation time stastp and a decoding time 
stamp, respectively indicating the presentation time and 
the decoding time, are included in the video packet 

20 headers of the video packets in which the video streams of 
I -pictures or P-pictures begin. However, both types of 
time stamps need not be included, because, according to 
the MPEG decoding rules, the presentation time of each 
I -picture or P-picture is the same as the decoding time of 

25 the following I-picttxre or P-picture. In other words, the 



wo 94/30014 




PCT/JP94/00W2 



32 

decoding time stamps can be csaitted, and eacti X-plctxire or 
P-picture can be decoded at tbe time indicated by the 
presentation time staiz^ of the previous I-picture or 
P-picture. 

5 Figure 10 also shows the consequence of the differing 

decoding and presentation times of the KBEG video signal. 
It can be seen from the bit index curve that the video 
decoder removes the video streams of the pictures from the 
video input buffer in the order in which they were 

10 transferred into the input buffer from the medium 5, i.e., 
in non* sequential picture order. However, the 
presentation time stamps of the pictures cause the 
pictures to be displayed in their sequential order shown 
at the bottom of the figure. 

15 As stated above, the time stamps are Included In the 

multiplex layer of the multiplexed bit stream, and not in 
the audio or video stream layer. This means that when the 
multiplexed bit steam is demultiplexed in the decoder, the 
correlation between the time stasqps and the access units 

20 to which they pertain is lost. The decoder must therefore 
Include a provision to link the time stasqps extracted from 
the aoultiplexed bit stream with their respective access 
units. One approach is shown in Figures llA cmd llB. 
In Figure llA, the decoder 600 Includes the 

25 demultiplexer 601, which receives the multiplexed bit 
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stream from the m*^^^^^ 5. The demultiplexer demultiplexes 
the video stream and the video time stamps from the 
multiplexed bit stream and feeds these Into the video 
stream reconfiguration unit 692. The demultiplexer also 
5 demultiplexes the audio stream and the audio time stamps 
from the multiplexed bit stream and feeds these Into the 
audio stream reconf Igtiratlon unit 693. The output of the 
video stream reconfiguration unit is fed into the video 
Input buffer 602, which precedes the video decoder 605. 

10 The decoding In the video decoder is controlled by the 

picture rate control circuit 698 in response to the video 
time stamps. The output of the audio stream 
reconfiguration unit 693 Is fed into the audio Input 
buffer 603, which precedes the audio decoder 606. 

15 Decoding In the audio decoder Is controlled by the 

sasq:>llng rate control circuit 699 in response to the audio 
time stamps. 

The demultiplexer 601 receives the multiplexed bit 
. stream 85 from the medium 5 and separates it into the 
20 video stream, the video time stamps, the audio stream, and 
the audio time stamps. The video stream and the video 
time stamps are fed into the video stream reconfiguration 
unit 692, which inserts the video time stamps into the 
video stream. For example, a video time stamp is inserted 
25 between the picture i, and the picture 1+1 shown in Figure 
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llB. The video stream, reconfigured as shown in Figure 
IIB, is fed to the video input buffer 602, where it is 
temporarily stored. The video decoder 605 removes the 
video stream, including the video time staatps, from the 
5 video input buffer 602 in the order in which it was 
received by the video input buffer. 

In a similar manner, the audio stream reconfiguration 
unit 693 receives the audio stream and the audio time 
stamps from the multiplexer 601 and inserts the audio time 

10 stamps into the audio stream. For exasq>le, an audio time 
stas^ is inserted between the access unit ± and the access 
iinit j-fl of the audio stream shown in Figure IIB. The 
audio stream, reconfigured as shown in Figure IIB, is then 
fed from the audio stream reconfiguration unit to the 

15 audio input buffer 603, where it is temporarily stored. 

The audio decoder 606 removes the audio stream, including 
the audio time stax^ps, from the audio ii^nit buffer in the 
order in which it was received by the audio ix^mt buffer. 
The video decoder 605 decodes the video stream removed 

20 from the video dLnput buffer 602 in response to timing 
signals received from the picture rate control circuit 
698. The picture rate control circuit is, in turn, 
controlled by the time stanqps fed from the video decoder. 
Similarly, the audio decoder 606 decodes the audio stream 

25 removed from the audio input buffer 603 in response to 
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timing signals received from the eas^llng rate control 
circuit 699. The sampling rate controller Is, In turn, 
controlled by the audio time stasqps fed from the audio 
decoder. 

5 The decoder just described solves the problem of 

correlating the time stamps included in the multiplex 
layer with the video and audio access units to Which they 
belong. However, embedding the time stands into the audio 
and video streams results in streams that are no longer 

10 standard. A decoder that is suitable for decoding, for 

exan^le, a video stream with embedded time stamps would be 
unsuitable for decoding a video stream in an application 
in which tdLme stamps are not used. It is therefore 
preferable to correlate the time stamps with the access 

15 units to Which they belong in a way that does not result in 
a non-standard stream and a non-standard decoder. 

Recently, the MPEG standards have permitted packets of 
information other than an audio stream or a video stream 
to be included in the multiplexed bit stream. For 

20 exanqple, packets of directory information may be added to 
the bit stream. Directory information allows pictures to 
be displayed during fast forward operations by providing 
the address of successive access points in the multiplexed 
bit stream. An access x>oint is a access unit can be 

25 decoded without requiring that another access unit be 
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decoder. For exaxnple, a video access point is a pictxxre 
that is wholly or partially coded using intra-picture 
coding. An access point is normally located at the 
beginning o£ each Group o£ Pictures. 
5 The UPE6 standards stipulate that the packets 

containing directory information (directory packets) be 
interleaved with the audio packets and the video packets 
in the multiplexed bit stream, and also stipulate that a 
directory information buffer be provided in the decoder. 

10 However, the MPEG standards define neither the size nor 
the operation of the directory buffer. Because of the 
memory constraints in processors xised in MPEG decoders, 
decoder designers allocate relatively little memory for 
buffering the directory information. Moreover, encoder 

15 designers have customarily made the directory packets 
relatively large, so that the directory packets occur 
relatively rarely in the multiplexed bit stream. 

The impact of the present relationship between the 
directory buffer size and the size and spacing of the 

20 directory packets on the fast- forward operation of a video 
tape recorder is shown in Figures 12A to 12E. Figure 12A 
shows the arrangement of part of the multiplexed bit 
stream as recorded on the video tape. The directory 
packet consists of the directory packet header 

25 (Dir.Pkt.Hdr) , followed by a set of directory entries, one 
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directory entry for each one of the following Groups of 
Pictxires. Following the directory packet are plural video 
packets containing the video stream of the Groups of 
Pictures. Since, in this example, there are 20 Groups of 
5 Pictures following the directory packet, the directory 
packet includes 20 directory entries, in these figures, 
the audio packets interleaved with the video packets have 
been omitted to simplify the drawing. 

During the fast-forward operation, the directory 

10 packet header is recognized, and the contents of the 

directory packet are read from the tape, and transferred 
into the directory buffer, as shown in Figure 12B. 
However, since the directory buffer typically has a 
capacity of about 500 bits, and each directory typically 

15 requires about 100 bits, the directory buffer overflows 
after the first five directory entries have been stored. 

iif ter the contents of the directory packet have been 
reproduced from the tape, the address of the beginning of 
the first Group of Pictures (GOP 0) is read from the 

20 directory buffer, and the tape is advanced to this address 
to enable the access point at the beginning of the first 
Group of Pictures to be reproduced from the tape, as shown 
in Figure 12C. While this picture is being decoded for 
display, the address of the beginning of the second Group 

25 of Pictures (GOP 1) is read from the directory buffer, and 
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the tape is advanced to this address to enable the access 
point, e.g., I -picture, at the beginning of the second 
Group of Pictures to be reproduced from the tape, also as 
shown in Figure 12C. This process is repeated, as shown 
5 in Figure 12C up to the fifth Group of Pictures (GOP 4), 
after which the contents of the directory buffer are 
exhausted. 

Then, the tape has to be rewound back to the directory 
packet to reproduce the next five of the directory 

10 entries. These directory entries are stored in the 
directory buffer, as shown in Figure 12D. The tape 
recorder then uses these five new directory entries to 
fast forward through the pictures at the begixmings of the 
sixth through tenth Groups of Pictures (GOPs 5-9), as 

15 shown in Figure 12E. In all, the directory packet must be 
reproduced four times for the pictures at the beginning of 
each of the twenty Groups of Pictures GOP 0-GOP 19 to be 
reproduced. 

The mismatch between the directory buffer capacity, 
20 and the size and spacing of the directory packets makes 
the fast forward operation an extremely slow one if 
pictures are to be reproduced during the fast-forward 
operation, something that is routine during the fast 
forward operation in an analog video tape recorder. 
25 Using a larger directory buffer is not a conqplete 
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solution to ttie problem just described (altbougb a larger 
buffer may reduce the seriousness of the problem) because 
the MPEG standards do not define the size and operation of 
the directory packet. Hence, no matter how large the 
5 directory buffer is made, the possibility of a directory 
packet larger than the directory buffer always exists. 

As an alternative to embedding time stamps in the 
audio and video streams following demultiplexing, it has 
been proposed to provide time stan^ buffers to store the 

10 time stamps until they are needed. Sepaxate buffers may 
be provided for the time stasis relating to audio access 
units and for the time stamps relating to video access 
units. Again, the HPE6 standards include no direct 
specification for the size and operation of these buffers. 

15 However, the current MPEG standards reouire that the 

system target decoder have a maximuTn buffering delay of 
one second for both audio and video. This means that the 
time staiz^s need only be buffered for a maximum of one 
second, which enables the maximum size of the time stas^ 

20 buffers to be calculated. If a time stamp is provided for 
each picture in the video stream, a buffer capacity of 30 
time stamps must be provided for the video time stamps. 
Similarly, if a time stanqp is provided for each audio 
access unit, a buffer capacity of 115 time stanqps must be 

25 provided for the audio time stamps. 
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In the manner just described, the MPEG standards 
indirectly iiopose maximum size on the audio and video 
time stamp buffers. However, this way of setting the 
may^imiin slze of the time stasqp buffers has an undesirable 
5 side effect, nainely, it makes the MPEG standards 

unsuitable for use in applications in which a longer 
buffer delay is necessary. For example, the low 
picture-rate, low bite-rate video signal shown in Figure 
13, although otherwise capable of being multiplexed 

10 according to an MPEG-standard bit rate, cannot be 

multiplexed by the MPEG standard because it recjuires a 
decoder buffer delay of about 5 seconds. 

Since the MPEG standards are meant to be used in many 
applications, it is desirable to eliminate the maximum 

15 delay requirement defined by the MPEG standard and to 

establish instead a more rational way of defining the time 
stasqc) buffer sizes. 

Disclosure of Invention 

20 The present invention provides a method of generating 

a bit stream by multiplexing non-coDq>ressed auxiliary 
information with an information stream. The information 
stream is obtained by conqpressing fixed-size units of an 
information signal with a varying con^ression ratio to 

25 provide varying-sized units of the information stream. 
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The auxiliary information is for use in sxibsequently 
decoding the information stream, imits of the atixiliary 
information correspond to the units of the information 
signal. In the method, the information stream is divided 
5 in time into information stream portions. The 

non-ccunpressed auxiliary information is also divided in 
time into auxiliary information portions. The information 
stream portions and the auxiliary information portions are 
interleaved to provide the bit stream. Finally, the 

10 information stream dividing, auxiliary information 
dividing, and interleaving steps are controlled Joy 
emulating decoding of the bit stream by a hypothetical 
system target decoder. The hypothetical system tasrget 
decoder Includes a demultiplexer that demultiplexes the 

15 bit stream, a serial arrangement of an information stream 
buffer and an information stream decoder, and a serial 
arrangement of an auxiliary information buffer and an 
auxilieury information processor. Each serial arrangement 
is connected to the demultiplexer . The information stream 

20 dividing, auxiliary information dividing, and 
interleaving steps axe controlled such that the 
information stream buffer and the auxiliary information 
buffer neither overflow nor underflow. 

The demultiplexer receives the bit stream and extracts 

25 from the bit stream the information stream and the 
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auxiliary information for feeding to the information 
stream buffer and the auxiliary information buffer, 
respectively. The information stream buffer and the 
auxiliary information buffer respectively have a first 
5 target size and a second target size. The information 
stream decoder removes the varying- sized units of the 
information stream from the information stream buffer at a 
first target timing / and the auxiliary information 
processor removes the corresponding fixed-sized units of 

10 the auxiliary information from the auxiliairy information 
buffer at a second target timing. 

According to the method, when the bit stream is a 
multi- layered bit stream, the interleaving step may 
interleave the information stream s>ortions and the 

15 auxiliary information portions in the same one of the 
layers of the bit stream, or may interleave the 
information stream portions and the auxiliary information 
portions in different layers of the bit stream. 
The auxiliajry Information may be directoiry 

20 information for the information stream, in which case, the 
information stream may include plural access points, and 
each unit of the directory information would relate to one 
of the access points. The information stream may contprise 
plural access units, and the auxiliary information may be 

25 a set of time staiqps for decoding the access units of the 
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information stream. 

The present invention also provides an encoder for 
generating a bit stream. The encoder includes a 
cooq>ressor that compresses fixed-sized units of an 
5 information signal with a varying compression ratio to 

provide varying-sized units of an information stream. An 
information stream divider means divides the information 
stream in time into information stream i>ortions. An 
auxiliary information divider divides non- compressed 

10 auxiliary information in time into auxiliary information 
portions. The auxiliary information is for use in 
subsequently decoding the information stream, units of 
the auxiliary information correspond to the units of the 
information signal . A multiplexer se<zuentially arranges 

15 the information stream portions and the auxiliary 

information x>ortions to provide the bit stream. The 
multiplexer includes a controller that controls the 
information stream divider and the auxiliary information 
divider by emulating decoding of the bit stream by a 

20 system target decoder. The system target decoder includes 
a demultiplexer that demultiplexes the bit stream, a 
serial arrangement of an information stream buffer and an 
information stream decoder, and a serial arrangement of an 
auxiliary information buffer and an auxiliary information 

25 processor. Each of the serial arrangements is connected 
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to the snxltiplexing means* The controller controls the 
information stream divider and the auxiliary information 
divider such that the information stream buffer and the 
information stream decoder neither underflow nor overflow. 
5 The present invention also provides a system in ^^ch 

an information signal is compressed for transfer, together 
with non-compressed auxiliary information, to a medium as 
a bit stream and in which the bit stream is transferred 
from the meditim and is processed to recover the 

10 information signal by ei^pansion, and to recover the 

auxiliary information. The axixiliary information is for 
use in recovering the information signal. The system 
comprises an encoder and a decoder. 

The encoder comprises an information signal 

15 conq^ressor that provides an information stream by 

compressing fixed-sized units of the information signal a 
varying coo^ression ratio to provide varying- sized units 
of the information stream. The encoder also includes an 
multiplexer that sequentially arranges time-divided 

20 portions of the information stream and time-divided 

portions of the non-compressed auxiliary information to 
provide the bit stream for transfer to the medium. The 
multiplexer includes a controller that determines the 
division of the information stream and of the auxiliary 

25 information into the respective time-divided portions by 
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emulating decoding of the bit stream by a hypothetical 
system target decoder. The hypothetical system target 
decoder includes a demultiplexer that demultiplexes the 
bit stream, a serial arrangement of an information stream 
5 buffer and an information stream decoder, and a serial 
arrangement of an auxiliary information buffer and an 
auxiliary information processor. Each serial arrangement 
is connected to the demultiplexer. 

The decoder is similar to the system target decoder 

10 and includes demultiplexer that extracts the information 
stream and the auxiliary information from the bit stream 
transferred from the medium. A first input buffer 
receives the auxiliary information from the demultiplexing 
means, and a circuit removes a unit of the auxiliary 

15 information from the first input buffer. The first input 
buffer has a size of at least the size of the auxiliary 
information buffer. A second input buffer receives the 
infosnnation stream from the demultiplexing means. The 
second input buffer has a size of at least the size of the 

20 information stream buffer. A decoder removes one of the 
varying- sized tinits of the information stream from the 
second input buffer and for expands the removed unit of 
the information stream to recover a unit of the 
information signal. 

25 The present present invention also provides a decoder 
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for a bit stream obtained by imiltiplexlng non- compressed 
auxiliary information with an information stream. Tbe 
information stream is obtained by compressing fixed-size 
units of an information signal with a varying compression 
5 ratio to provide varying-sized units of the information 
stream. The atixiliary information is for use in 
subsequently decoding the information stream. Units of 
the auxiliary information correspond to the units of the 
information signal. The decoder comprises a demultiplexer 

10 that extracts the information stream and the auxiliary 
information from the bit stream. A first input buffer 
receives the auxiliary infosonation from the demultiplexer, 
and a circuit removes a unit of the auxiliary information 
from the first input buffer means. A second input buffer 

15 receives the information stream from the demultiplexer. A 
decoder removes one of the varying-sized units of the 
information stream from the second input buffer means and 
exi>ands the removed unit of the information stream in 
response to the unit of the auxiliary information to 

20 recover a unit of the information signal. 

The present invention further provides a method of 
deriving a multiplexed bit stream from an information 
signal. In the method, an encoder is provided. The 
encoder includes a conqpressor that cooqpresses units of the 

25 information signal to provide access units of an 
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information stream. A first buffer having a first size 
buffers the access units of the information stream. A 
circuit provides a time stamp each time the first buffer 
receives an access unit of the information stream. A 
5 second buffer having a second size buffers the time 

stasips . A multiplexer multiplexes the information stream 
fl r^^ the time stamps to provide the multiplexed bit stream. 

A hypothetical system target decoder for decoding the 
multiplexed bit stream is defined. The hypothetical 

10 system target decoder includes a demultiplexer for 

demultiplexing the bit stream, a serial arrangement of an 
information stream buffer and an information stream 
decoder, and a serial arrangement of a time stas^ buffer 
and a tiane stan^ processor. Bach serial arrangement is 

15 connected to the demultiplexer, rrhe size of the first 

buffer and the size of the second buffer are determined by 
emulating decoding of the bit stream using the 
ivpothetical system target decoder. Then, the information 
signal is encoded using the encoder with the size of the 

20 first buffer and the size of the second buffer set to the 
respective sizes determined by the determining step. 

Finally, the present invention provides a method for 
deriving a bit stream from an information signal. In the 
method, units of the information signal are con^ressed to 

25 provide units of an information stream. The units of the 
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Infoxmatlon stream Include access points. Pointers 
pointing the access points in the infoannation stream are 
derived from the information stream. Then, the 
information stream divided into information packets is 
5 multiplexed together with pointer s>ackets to provide the 
bit stream. The multiplexing is performed such that a set 
of information packets containing pltiral consecutive 
access points is multiplexed adjacent a pointer packet 
containing the pointers pointing only to the plural 
10 consecutive access points. 

Brief Description of Drawings 

Figure 1 is a block diagram of an encode/decode system 
for an audio signal and a video signal showing the 
relationship between the system and a system target 
15 decoder according to the prior art. 

Figure 2 shows the structure of the multiplexed bit 
stream produced by the encoder of the system shown in 
Figure 1. 

Figure 3 shows the structure of the decoder of the 
20 system shown in Figure 1. 

Figure 4A shows the audio input buffer and the audio 
decoder in the decoder of the system shown in Figure 1. 

Figure 4B is a bit index curve showing the average bit 
index at the input of the audio input buffer in the 
25 decoder of the system shown in Figure 1. 
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Figure 4C is a bit index curve showing tlie actual bit 
index at the input of the audio input bu££er in the 
decoder of the system shown in Figure 1. 

Figure 4D is a bit index curve showing the bit index 
5 at the output of the audio input buffer in the decoder 
operation of the system shown in Figure 1. 

Figure 5A shows the video input buffer and video 
decoder in the decoder of the system shown in Figure 1. 

Figure 5B is a bit index curve showing the average bit 
10 index at the input of the video input buffer in the 
decoder of the system shown in Figure 1. 

Figure 5C is a bit index cixrve showing the actual bit 
index at the input of the video input buffer in the 
decoder of the system shown in Figure 1. 
15 Figure 5D is a bit index curve showing the bit index 

at the output of the video input buffer. 

Figure 6A shows ideal buffering in the video input 
buffer in the decoder of the system shown in Figure 1. 
Figure 6B shows the effect of a changing input bit 
20 rate on the buffering provided by the video input buffer 
in the decoder of the system shown in Figure 1. 

Figures 7A, 73, and 7C show various ways of remedying 
buffering errors in the video input buffer in the decoder 
of the system shown in Figure 1. 
25 Figxxres 8A, 8B, 8C, and BD show the effect of the 
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buffering staxt up delay on thB buffering provided by the 
video input buffer in tlie decoder of the system shown in 
Figure 1. 

Figure 9 shows the relationship between the structtxre 
5 of the xaultiplexed bit stream and the operation of the 

video input buffer in the decoder of the system shown in 
Figure 1. 

Figure 10 shows the relationship between various types 
of picture encoding and the operation of the video input 

10 buffer in the decoder of the system shown in Figure 1. 

Figiire llA shows an alternative structure for the 
decoder of the system shown in Figure 1, in vdiich, after 
demultiplexing the multiplexed bit stream, the respective 
time stascps are embedded into the video and audio streams. 

15 Figure llB shows the audio and video streams with 

embedded time stasis produced by the decoder shown in 
Figure llA. 

Figures 12A to 12B show the effect of the known way of 
multiplexing directory packets into the multiplexed bit 
20 stream on the fa^t- forward operation of a video tape 
recorder. 

Figure 13 shows a low-bit rate that cannot be decoded 
using a decoder conforming with the buffering delay limit 
imposed by the HPE6*1 standard. 
25 Figure 14 is a block diagram of a first embodiment of 
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an encode /decode system according to the invention for an 
aiidio signal and a video signal, showing the relationship 
between the system and a first embodiment of a system 
target decoder according to the invention. 
5 Figure 15 shows the structure of a first embodijiient of 

an encoder according to the invention showing the 
reference of various element of the encoder to the system 
target decoder according to the invention* 

Figure 16A shows the preliminary multiplexed bit 

10 stream generated by the encoder shown in Figure 15. 

Figure 16B shows the multiplexed bit stream generated 
by the encoder shown in Figure 15. 

Figure 17 is a block diagram of a first embodiment of 
a decoder according to the invention. 

15 Figure 18 shows the bit index at the input of the 

video input buffer and at the input and the output of the 
directory input buffer in the first embodiment of the 
decoder shown in^Figure 17. 

Figure 19 shows the relationship between the structure 

20 of the multiplexed bit stream produced by the first 

embodiment of the encoder shown in Figure 15 and the bit 
indices of the input of the video input buffer and the 
input and the output of the directory input buffer in the 
first embodiment of the decoder shown in Figure 17. 

25 Figure 20 shows the effect of the way of multiplexing 
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directory packets Into the multiplexed bit stream 
according to the invention on the fast- forward ox>eration 
of a video tape recorder. 

Figure 21 is a block diagram of a second enibodlment of 
5 an encode/decode system according to the invention for an 
audio signal and a video signal, showing the relationship 
between the system and a second embodiment of the a target 
decoder according to the invention. 

Figure 22A shows the structure of a secoxid embodiment 
10 of an encoder according to the invention showing the 

various operational parameters of the encoder determined 
by reference to the second embodiment of the system target 
decoder according to the invention. 

Figure 22B is a block diagram illustrating the process 
15 by which the operational parameters of the encoder shown 
in Figure 22A are determined with reference to the second 
embodiment of the system target decoder according to the 
invention. 

Figure 23 is a block diagram of a second embodiment of 
20 a decoder according to the invention. 

Figure 24A illustrates the cosx^onents of the total 
video delay of the encode /decode system. 

Figure 24B illustrates the components of the total 
video delay and the total aiidio delay of the encode/decode 
25 system according to the invention. 
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Figure 25 shows the relationship between the structure 
of the multiplexed bit stream produced by the first 
embodiment of the encoder shown in Figure 22A and the bit 
indices of the input of the video input buffer and the 
5 input and the output of the video time staxz^ buffer in the 
secoxid embodiment of the decoder shown in Figure 23. 

Best Hode for Carrying Out the Invention 

The present invention ea^ands the definition of the 

10 system target decoder (STD) to include an input buffer and 
a decoder for each stream of non-cos^ressed auxiliary 
information, such as time stamps and directory 
information, in addition to the input buffer and decoder 
for the audio stream and the input buffer and decoder for 

15 the video stream. As a consequence of the redefined STD, 
a practical decoder according to the invention will 
incloide an input buffer and a decoder for each stream of 
auxiliary information in addition to the respective input 
buffer and decoder for each of the audio stream and the 

20 video stream. Finally, an encoder according to the 

invention multiplexes the audio stream, the video stream, 
and each of the auxiliary information streams taking 
account of the parameters of the modified STD according to 
the invention. 

25 This approach allows many different types of auxiliary 
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information streams to be included in tlie multiplexed bit 
stream provided that (a) an input buffer and a decoder is 
provided in the system target decoder for each auxiliary 
information stream, and (b) each aiixiliary information 
5 stream is included in the multiplexed bit stream such that 
none of the input buffers in the STD overflows or 
underflows . 

A first embodiment of an encode/decode signal 
processing system ID according to the invention, in which 

10 a directory input buffer and a directory decoder are 

provided according to the invention in the system target 
decoder, is shown in Figure 14. 

In this, the encoder 1 receives the video signal S2 
from the video signal storage mediinn 2, and receives the 

15 audio signal S3 from the audio signal storage medium 3. 

The audio signal S3 could alternatively be (and is more 
usually) also received from the video signal storage 
medium 2 instead of from a separate audio storage medium. 
The encoder 1 congresses and codes the video and audio 

20 signals, and multiplexes the resulting audio stream and 
video stream to provide the multiplexed bit stream SI, 
which is fed to the medium 5 for storage or distribution. 
The medium can be any medliTm suitable for storing or 
distributing a digital bit stream, for example, a CD-ROM, 

25 a laser disk (LD), a video tape, a magneto-optical (MO) 
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storage medium, a digital caapact, cassette (DCC), & 
tearrestrlal or satellite broadcasting system, a cable 
system, a fibre-optic distribution system, a telephone 
system, an ISDN system, etc. 
5 The encoder 1 congresses and codes the video signal 

picture-kv-Picture. Each picture of the video signal is 
conqpressed in o^e of three compression modes. A picture 
con^ressed in the intra-picture compression mode is called 
an l-picture. In the intra-picture ccaxqpression mode, the 

10 picture is conqpressed by itself without reference to other 
pictures of the video signal. Pictures compressed in the 
inter-picture compression mode are called P-pictures or 
B-pictures. A P-picture is coiqpressed using forward 
prediction coding using as a reference picture a previous 

15 I-picture or P-picture, i.e., a picture occurring earlier 
in the video signal. A B-picture is coovressed using 
bidirectional prediction coding. Each block of the B- 
picture may use as a reference block any one of the 
. following: a block of a previous I -picture or P-picture, 

20 a block of a following P-picture or I-picture (i.e., a 

picture occurring later in the video signal), or a block 
obtained by performing linear processing on a block of a 
previous I-picture or P-picture and block of a following 
I-picture or P-picture. Typically, about 150 Kbits (Kb; 1 

25 Kb = 1024 bits) of the video stream are required for an 
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X-picture, 75 Kb of tlie video stream are required for a 
P*plctiire, and 5 Kb of the video stream are required for a* 
B-picture. 

The digital video and audio processing system 10 also 
5 includes the decoder 6, which receives as its input signal 
the bit stream S5 from the tno/q^^wn 5. The decoder 6 
performs demultiplexing inverse to the multiplexing 
performed by the encoder 1. The decoder performs 
processing con^lementary to that performed by the encoder 

10 1 to decode and ea^and the resulting audio stream and 

video stream to provide the recovered video signal S6A awrt 
the recovered audio signal S6B respectively. The 
recovered video signal S6A and the recovered audio signal 
S6B closely match the video signal S2 and the axidio signal 

15 S3 fed into the encoder 1. 

Figure 14 also shows the system target decoder (STD) 4 . 
which is used to define the processing performed by the 
encoder 1 and the decoder 6. In practical video and audio 
signal processing systems, the encoder does not include an 

20 actual system target decoder, but instead performs the 

encoding processing and multiplexing taking account of the 
system target decoder parameters. Also, in practical 
systems, the decoder is designed taking the system target 
decoder parameters into account. These relationships 

25 between the system target decoder and the encoder and the 
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decoder are Indicated In Figure 14 by the broken line 
labelled S4A interconnecting the system target decoder 4 
and the encoder 1, euid the broken line labelled S4B 
interconnecting the system target decoder 4 and the 
5 decoder 6. 

The system target decoder 4 includes a reference video 
decoder, a reference atidio decoder, and their respective 
input buffers. In addition, the system target decoder 
includes a directory decoder and an input buffer for the 

10 directory decoder. The size of the audio input buffer, 
the size of the video input buffer, and the operation of 
the audio and video decoders are defined by the MPEG 
standards, in addition, the invention defines the size of 
the directory buffer and the operation of the directory 

15 decoder to make them compatible with the sizes of the 
other buffers and the operation of the other decoders 
defined by the MPEG standard. 

As xoentioned above, the concept of the system target 
decoder provides collectibility between encoders and 

20 decoders of different designs as follows. All encoders 
are designed to provide a bit stream that can be 
successfully decoded by the system target decoder, azxd 
that does not cause the respective input buffers in the 
system target decoder to overflow or underflow, in 

25 addition, all decoders are designed taking the system 
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target decoder parameters into account. As a result, all 
such decoders will be capable of successfully decoding the 
bit stream produced by any of the encoders designed to 
produce a bit stream capable of being decoded by the 
5 system target decoder* By including a directory buffer 
and a directory decoder in the STD, the invention e na b les 
encoders and decoders to be made cco^tible with one 
another in an additional respect, namely, that of 
providing and decoding directory information. 

10 The structure of the hypothetical system target 

decoder 4 shown in Figure 14 is as follows. The 
demultiplexer 41 notionally receives the bit stream SI 
from the encoder 1. The demultiplexer 41 demultiplexes 
the bit stream into a video stream SIV, an audio stream 

15 SIA, and a directory stream SID. The video stream is fed 
to the video input buffer 42, the output of which is 
connected to the video decoder 45. The audio stream from 
the demultiplexer 41 is fed into the audio input buffer 
43, the output of which is coimected to the audio decoder 

20 46. The directory stream from the demultiplexer 41 is fed 
into the directory input buffer 44, the output of which is 
connected to the directory decoder 47. 

In the exa2iqple shown in Figure 14, the video input 
buffer 42 and the audio input buffer 43 have the 

25 respective storage capacities defined by the HPE6 
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Standards, namely, 46K bytes and 4 Kbytes in the HPE6-1 
standard. The directory input buffer 44 according to the 
invention has a storage capacity of IR bits, so that it 
will hold 10 directory entries. This capacity is of the 
5 same order as, but is larger than, the directory buffer 
capacity currently used. These capacities are set in 
consideration of the practical constraints imposed by 
providing the real decoder 6 using a processor that cannot 
include a large amount of storage. 

10 The video decoder 45 removes the video stream from the 

video input buffer 42 one video access tinit at a time, 
i.e., one picture at time, at a timing corresponding to 
the picture rate of the video signal, e.g., once every 
1/29.94 seconds in an NTSC system. The amount of the 

15 video stream removed from the video input buffer for each 
picture varies because of the different amount of 
compression applied to each picture. 

The audio decoder 46 removes the audio stream from the 
audio input buffer 43 one axxdio access unit at a time 

20 predetermined timing. 

The directory decoder 47 removes the directory stream 
from the directory input buffer one directory entry at a 
time as required. For example, in the fast-forward mode 
described above, after the access point at the beginning 

25 of each Group of Pictures is read, the directory decoder 
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removes from ^he directory input buffer the directory 
stream of the directory entry indicating the location of 
the access point at the beginning of the next Group of 
Pictures . 

5 The structure of an embodiment of the encoder 1 

according to the invention is shown in Figure 15. The 
encoder generates a multiplexed bit stream from an audio 
signal and a video signal for feeding to the medium 5. The 
encoder also includes directory information in the 

10 multiplexed bit stream to enable program selections to be 
located, and to enable pictures to be displayed in fast 
forward and fast rewind operations. In the multiplexed 
bit stream, each directory packet of directory information 
must be located ahead of the video packets containing the 

15 video stream to which the directory entries in the 

directory packets belong. However, the directory entries 
in the directory packet are generated from the video 
stream following the directory packet. Therefore, the 
directory entries must be added to the directory packets 

20 after the video signal has been encoded and multiplexed 
into the multiplexed bit stream. The encoder 1 can only 
do this in one pass if the medium 5 has a random access 
capability (such as a hard disk) so that the TnoH^lm^ can 
occasionally go back to write the directory entries into 

25 the directory packets. If the meditam 5 does not have a 
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random access capability, or if the xneditxm 5 is a 
trazxsxaission xoedium, the encoder can provide the 
multiplexed bit stream including directory entries in two 
passes. As an example, an embodiment of the encoder will 
5 be described that provides a multiplexed bit stream in two 
passes for recording on the master tape from which 
distribution media (such as video tapes or video discs) 
are manufactured. 

In the encoder 1, the digital video signal B2 is fed 

10 into the video encoder 201, and the digital audio signal 

S3 is fed into the audio encoder 202. The video stream and 
the audio steam from the video encoder 201 and the audio 
encoder 202, respectively, are fed, after internal 
buffering (not shown) into the multiplexing circuit 203. 

15 The output of the multiplexing circuit 203 is connected to 
the digital storage medium (DSM) 210, where the resulting 
preliminary multiplexed bit stream is tenqporarily stored. 
The multiplexer 203 assembles the preliminary 
. multiplexed bit stream by time multiplexing the elementary 

20 streams, i.e., the video stream, the audio stream, and a 

directory stream of dumn^ directory entries, into packets, 
and the packets into packs. The multiplexer also adds the 
xmxltiplexing layer, i.e., the packet header for each 
packet, and pack header for each pack. The multiplexer 

25 203 receives the headers from the header generator 204, 
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and receives tlie dusnny directory entries from tbe diuomy 
directory entry generator 205. 

The xmxltiplexer 203 also feeds the preliminary 
multiplexed bit stream to the directory entry generator 
5 231, which counts the bit index of the preliminary 

imiltiplexed bit stream and detects the access point at the 
beginning of each Group of Pictures to generate a 
directory entry for each access point. The directory 
entry generator assembles the directory entries into a 

10 directory stream, which it feeds to the directory storage 
Tnedium 233 for storage. 

The directory entry counter 235 tracks the state of 
the directory input buffer 46 in the system tasrget decoder 
4. The directory entry counter monitors the output of 

15 dummy directory entry 205 fed to the multiplexer 203. 

Each dvmnsy directory entry fed into the multiplexer 203 
increments the directory entry counter by one. The 
directory entry counter 235 also monitors the output of 
^he directory entry generator 231 fed to the directory 

20 stream storage medium 233. Each directory entry 

decrements the count of the directory entry counter by 
one. 

A preset limit is applied to the directory entry 
counter 235 according to the size of the directory input 
25 buffer 46 in the system target decoder 4. When the count 
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of tlie directory entry counter reaches tlie preset level, 
indicating that the directory input buffer is full, the 
directory entry counter feeds a buffer full interrupt to 
the dusnny directory entity generator 205. The buffer full 
5 interrupt stops the dummy directory generator from feeding 
dumnry directory entries to the multiplexer 203. When the 
directory buffer has a capacity of 1 kbits, the preset 
lijnit corresponds to ten dumniy directory entries. When 
the count of the directory entry counter 235 indicates 
10 that the directory input buffer 46 is empty, the directory 
entry counter feeds the buffer empty interrupt to the 
multiplexer 203 to cause the multiplexer to insert another 
dummy directory packet into the preliminary multiplexed 
bit stream. 

15 During second step of the encoding process, in which 

the directory entries are written over the dumn^ directory 
entries in the preliminary multiplexed bit stream to 
provide the multiplexed bit stream, the digital storage 
tno^i^wn 210 feeds the preliminary multiplexed bit stream 

20 and the directory storage medium 233 feeds the directory 
stream to the directory stream insertion circuit 250. The 
directory stream controller 256 monitors the preliminary 
bit stream read out from the digital storage medium 210 to 
determine the locations in the preliminary bit stream of 

25 the directory packets into which the directory stream is 
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to be inserted. When it detects each directory packet 
header « the directory stream controller feeds the 
directory stream insert control signal to the directory 
stream insertion circuit and the directory stream storage 
5 medium. The directory stream counter 258 determines the 
number of directory entries inserted into the directory 
packet # and causes the directory stream controller to 
change the state of the directory stream insert control 
signal ^^n the directory packet is full. 

10 The video encoder 201, the audio encoder 202, the 

multiplexer 203, the directory entry counter 235, and the 
directory stream counter 258 are all designed to provide a 
preliminary multiplexed bit stream that, when notional ly 
decoded by the system target decoder 4, causes none of the 

15 input buffers 42, 43, and 44 in the system target decoder 
to overflow or underflow. This relationship is indicated 
by the dotted line S4A. 

The encoder 1 operates as follows . At the beginning 
of the recording, the multiplexer 203 tiims to the header 

20 generator 204 to receive all the headers for the start of 
the recording, and feeds these headers to the DSH 210. 
The multiplexer then receives from the header generator 
the pack header for the first pack in the recording, 
followed by the packet header for the first packet. The 

25 first packet is a directory packet, since the first packet 
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of tlie recording is a directory packet. 

The multiplexer 203 then turns to the dummy directory 
entry generator 205, and feeds dumn^ directory entries 
from the dummy directory entry generator to the DSM 210. 
5 Each dxims^ directory entry fed to the multiplexer 

increments the directoiry entry counter 235 by one. When 
the count of the directory entry counter reaches the 
preset lionit corresponding to the number of directory 
entries that can be accommodated in the directory input 

10 buffer 46 in the system target decoder 4, the directory 

entry counter feeds the buffer full interrupt to the dumn^^ 
directory entsry generator 205, which causes the dummy 
directory entry generator to stop feeding directory 
entries into the multiplexer. 

15 After it has fed the directory packet full of dxsmiv 

directory entries to the DSEU 210, the multiplexer 203 
turns back to the header generator 204 to receive the 
packet header of the first video packet, which it feeds to 
the DSM 210. Then, taking the respective states of the 

20 video input buffer 42 and the audio input buffer 43 in the 
system target decoder 4 into account, the multiplexer then 
multiplexes the video stream and the audio stream together 
to provide video packets and audio packets which it feeds 
to the DSM 210. 

25 During this process, the directory entry generator 231 
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monitors tlie preliminary multiplexed bit stream fed from 
the multiplexer 203 to the DSU 210 to detect each access 
point In the bit stream. An access point Is an access unit 
that Is capable of being decoded on Its own, without the 
5 need to decode other access units in the bit stream. For 
example, a video access x>olnt Is a picture that Is 
cosqpressed wholly or partially using Intra-plcture coding. 
An audio access point Is any audio access unit, in MPEG 
bit streams, an access point occurs at the beginning of 

10 each Group of Pictures. The directory entry generator 231 
also counts the bit Index of the preliminary multiplexed 
bit stream. Each time It detects an access point In the 
preliminary multiplexed bit stream, the directory entry 
generator converts the bit Index of the access point Into 

15 a relative address on the final storage m ed ium. I.e., the 
video cassette In this exaagple. The directory entry 
generator then creates a directory entry for that access 
point, v^ch It feeds to the directory entry storage 
medium 233 for storage as a unit of the directory stream. 

20 The directory entry counter 235 decrements Its count 

for each directory entry generated by the directory entry 
generator 231 and fed to the directory entry storage 
median 233. When the state of the directory entry counter 
corresponds to the directory input buffer 44 of the system 

25 target decoder 4 being empty, the directory entry counter 
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235 provides the bu££er empty interrupt to the multiplexer 
203. 

The buffer empty interrupt indicates to the 
multiplexer 203 that the multiplexer has received all of 

5 the access points whose directory entries will be stored 
in the preceding directory packet (in this example, the 
directory packet at the beginning of the pack) , and that 
it must include another directory packet in the 
preliminary multiplexed bit steam before the next access 

0 point in the video stream. Accordingly, in response to 

the buffer empty interrupt, the multiplexer 203 coo^pletes 
the cxxrrent video packet, and the following audio packet, 
if any. After this, the multiplexer turxis to the header 
generator 204 to receive a directory header, which it 

5 feeds to the DSM 210. The multiplexer then turns to the 
dummy directory entry generator 205, and feeds dumn^ 
directory entries from the dumny directory entry generator 
to the DSM 210 until it receives the buffer full interrupt 
from the directory entry counter 235. The multiplexer 

) then proceeds to multiplex more of the video stream and 
the audio stream, until another buffer enqpty interrupt 
indicates that another directory packet must be inserted. 
The resulting preliminary multiplexed bit stream recorded 
on the DSM 210 is shown in Figure 16A. 

► When the preliminary multiplexed bit stream the 
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directory entries for tlie whole recording axe respectively 
stored on the digital storage medium 210 and the directory 
storage medium 233, the second pass of the encoding 
process is performed to replace the dummy directory 
5 entries in the directory packets in the preliminary 

multiplexed bit stream with directory entries from the 
directory stream to provide the multiplexed bit stream. 
The preliminary multiplexed bit stream is reproduced from 
the DSM 210 from its beginning, and is fed into the 

10 directory stream insertion circuit 250. The directory 

stream controller 256 monitors the preliminary multiplexed 
bit stream for directory headers. 

Bach time the directory stream controller detects a 
directory header, it sends the directory entry insert 

15 signal to the directory entry storage medium 233 and to 
the directory stream insertion circuit 250, and 
initializes the directory stream counter 258 to the preset 
value discussed above. In response to directory entry 
insert signal, the directory entry storage medium 233 

20 feeds the directory stream to the directory stream 

insertion circuit 250. The directory stream insertion 
circuit places each directory entry in the directory 
stream into the directory packet following the directory 
header in the preliminary multiplexed bit stream. The 

25 directory stream insertion circuit overwrites the dumn^ 
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directory entries in the preliminary multiplexed bit 
stream with the directory entries. The directory stream 
insertion circuit feeds the resulting multiplexed bit 
stream to the medium 5 (Figure 14) • 
5 The directory stream coiinter 258 monitors the 

directory entries in the directory stream fed to the 
directory stream insertion circuit 250. Each directory 
entry fed to the directory stream insertion circuit 
decrements the directory stream counter by one. When the 

10 directory stream counter reaches zero, the directory 
stream counter feeds the packet full signal to the 
directory stream insertion controller 256. In response to 
this signal, the directory stream insertion controller 
changes the state of the directory entry insert signal. 

15 This causes the directory entry storage medium 233 to stop 
sending the directory stream to the directory stream 
insertion circuit 250, and causes the directory stream 
insertion circuit to feed the preliminary multiplexed bit 
•stream out unchanged as the multiplexed bit stream until 

20 the directory stream controller once more detects a 

directory packet header in the preliminary multiplexed bit 
stream. The resulting multiplexed bit stream fed to the 
medium 5 (Figure 14) is shown in Figure 16B. 

The same basic circuit arrangement can optionally be 

25 used to provide pictures in the fast-rewind mode in 
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addition to tlie fast- forward mode. I£ the same size 
directory input buffer 44 is en^loyed in the system target 
decoder 4, controlling the multiplexing of the directory 
packets according to the state of the directory input 
5 buffer in the system target decoder 4 results in 

approximately twice the niunber of directory x>ackets being 
inserted into the preliminary multiplexed bit stream than 
when pictures axB to be provided only in the fast-forward 
mode. This is because each directory packet must hold the 

10 directory entries for the n/2 access points following the 
directory packet (for use in the fast forward mode) and 
for the n/2 access points before the directory packet (for 
use in the fast rewind mode) , where n is the number of 
directory entries that can be stored in the directory 

15 input buffer 44 in the system target decoder 4. 

Figure 17 shows the structure of the decoder 6. The 
decoder 6 is designed in consideration of the parameters 
of the system target decoder 4 (Figure 14) to decode the 
multiplexed bit stream shown in Figure 16B produced by the 

20 encoder 1. As a result, the decoder 6 has a structure very 
similar to that of the system target decoder 4. 

The decoder 6 includes the demultiplexer 61, which 
receives the multiplexed bit stream 85 from the medium 5. 
The demultiplexer demultiplexes the multiplexed bit stream 

25 into the video stream S5V, the audio stream S5A, and the 
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directory stream S5D. Incidentally, as will be described 
in more detail below, the multiplexer also demultiplexes 
the video time stamps and the audio time stamps (not 
shown) £rom the multiplexed bit stream. 
5 The video stream 35V from the output o£ the 

demultiplexer 61 is fed into the video input buffer 62, 
which precedes the video decoder 65. The audio stream S5A 
from the demultiplexer is fed into the audio input buffer 
63, which precedes the audio decoder 66. The directory 
10 stream S5D from the demultiplexer is fed into the 

directory input buffer 64, which precedes the directory 
decoder 67. 

The video decoder 65 removes each access unit, i.e., 
picture, of the video stream from the video input buffer 

15 62 for decoding in the order in which the access unit was 
received by the video input buffer. The audio decoder 66 
removes each access unit of the audio stream from the 
audio input buffer 63 for decoding in the order in which 
the access unit was received by the audio input buffer. 

20 The directory decoder 67 removes each directory entry of 

the directory stream from the directory input buffer 64 in. 
the order in which the directory entry was received by the 
directory input buffer. 

The input buffers 62, 63, and 64 will be described in 

25 detail next. It is not possible to decode the elementary 
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Streams multiplexed in the multiplexed bit stream using 
completely matcMng clocks. The first reason for this is 
thatt as mentioned above, the compression ratios 
constantly change. The second reason for this is that the 
5 average transfer rates of the elementary streams from the 
medium 5 differ from the average input rate of the 
elementary streams to the respective decoders 65, 66, and 
67, depending on the error in the sampling rate clocks. 
Moreover, the elementary streams are transferred from the 

10 medium 5 via the demultiplexer 61 intermittently, and the 
decoders demand the access units of their respective 
elementary streams intermittently. Consequently , the 
instantaneous transfer rate of the elementary streams from 
the medium 5 and the instantaneous input rate of the 

15 elementary streams into their respective decoders do not 
match. Therefore, the input buffers 62, 63, and 64 are 
provided between the demultiplexer 61 and the respective 
decoders 65, 66, and 67 to accommodate the differences in 
the average transfer rate and the average input rate, and 

20 in the instantaneous transfer rate and the instantaneous 
input rate. 

Figure 18 shows in its upper part a bit index cixrve 
showing the time dependency of the transfer of the video 
stream S5V in the multiplexed signal from the medium 5 
25 into the video input buffer 62. No video stream is fed 
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into the video Input buffer at first, because the 
demultiplexer 61 first feeds the directory stream into the 
directory buffer 64. Then, following the first video 
packet header in the multiplexed bit stream, the 
5 demultiplexer transfers the video stream in the following 
video packet (s) into the video input buffer 62 at a 
siibstantially constant bit rate until it encounters the 
next directory packet header in the multiplexed bit 
stream. Xn response to the directory packet header, the 

10 demultiplexer interrupts feeding the video stream into the 
video input buffer while it feeds the directory stream in 
the directory packet into the directory ix^ut buffer 64. 
During this interruption, the bit index of the video 
stream remains unchanged. At the end of the directory 

15 packet, in response to the packet header of the first 
following video packet, the demultiplexer resumes 
transferring the video stream contained in the video 
packet (s) into the video input buffer until it encounters 
another directory packet header in the multiplexed bit 

20 stream. This process is repeated throughout the decoding 
process. The bit index at the output of the video input 
buffer is the same as that shown in Figure 5D. 

Transfer of the video stream into the video input 
buffer 62 is also interrupted when the multiplexer 

25 encoxinters a audio packet header in the multiplexed bit 
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Stream and transfers tlie audio stream in the following 
audio packet into the audio input buffer 63, as shown in 
Figure 4C. These interruptions occur more frequently than 
the interruptions to transfer the directory stream, but 
5 they have been omitted from Figure 18 to simplify the 
drawing. 

Figure 18 shows in its lower part a bit index curve of 
the time dependency of the transfer of the directory 
stream S5D in the multiplexed signal from the medium 5 

10 into the directory input buffer 64. The demultiplexer 61 
detects the directory packet header at the beginning of 
the multiplexed bit stream and transfers the directory 
access unit contained in the following directory packet 
from the medium 5 into the directory input buffer 64. 

15 Following the first directory packet, the demultiplexer 
ceases transferring the directory stream into the 
directory input buffer while it feeds the video stream in 
the following video i>acket(s) into the video input buffer 
62 and the audio stream in the following audip packet (s) 

20 into the audio input buffer 63. Then, the demultiplexer 
61 encounters the next directory packet header in the 
multiplexed bit stream and feeds the directory stream in 
the directory packet following the directory packet header 
into the directory input buffer. This process is repeated 

25 throughout the decoding process. 
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The lower part of Figure 18 also shows the bit Index 
of the output of the directory Input buffer 64 • The 
Initial transfer of directory stream Into the directory 
Input buffer at the beginning of the multiplexed bit 
5 stream fills the directory Input buffer to capacity. 
Then, as the video stream Is received, the directory 
decoder 67 removes directory entries one-by*one from the 
directory Inxnit buffer until the directory Input buffer Is 
empty. However, because the multiplexed bit stream has 

10 been constructed to take account of the operation of the 
directory Input buffer and the directory decoder, another 
directory packet occurs In the multiplexed bit stream 
before the next access point. As a result, the directory 
stream In the next directory packet Is tremsferred Into 

15 the directory Input buffer (a) when the directory Input 

buffer Is empty, so that transferring the directory stream 
Into the directory Input buffer does not cause the 
directory buffer to overflow, and (b) before the directory 
decoder attempts to remove another directory entry from 

20 the directory Input buffer, so that removing the next 

directory entry does not cause the directory Input buffer 
to underflow. 

Figure 19 shows how the bit Indices shown In Figure 18 
relate to the multiplexed bit stream produced by the 
25 encoder 1 (Figure 14). In Figure 19, the directory 
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packets In the bit stream are linked to tlie transfer o£ 
the directory stream into the directory input buffer 64 by 
solid lines, and the events in the video stream of the 
multiplexed bit stream are linked to the transfer of the 
5 video stream into the video input buffer 62 by curved 

broken lines. Also, transfer of the access point at the 
beginning of each group of picttires into the video input 
buffer 62 is linked to the removal of the directory entry 
for that access point from the directory input buffer by 

10 straight broken lines interconnecting the bit index curve 
of the video input buffer 62 and the bit index curve of the 
directory input buffer 64. 

Figure 20 shows the beneficial effect on the fast 
forward operation of a video tape recorder of the rational 

15 sizing and placement of the directory x>ackets in the 

multiplexed bit stream resulting from using the modified 
system target decoder according to the invention to 
control the multiplexing of the multiplexed bit stream. 
The resulting sizing of the directory packets in the 

20 multiplexed bit stream ensures that each directory packet 
contains only the nuniber of directory entries that can be 
accommodated in the directory input buffer 44 of the 
system target decoder, and, hence, in the directory input 
buffer 64 of the decoder 6. The resulting placing of the 

25 directory packets in the multiplexed bit stream ensures 
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tliat the directory entries contained In each directory 
packet belong only to the access points In the video 
stream In the video packets following the directory packet 
and before the next directory packet. Consequently^ 
5 Figure 20 differs from Figures 12A to 12E In that the 

video tape recorder does not have to go back several times 
to read the contents of the directory x>acket* 

During the fast- forward operation Illustrated In 
Flgixre 20, the video tape recorder first reads the 

10 directory packet at the beginning of the multiplexed bit 
stream, and transfers the directory stream to the 
directory Input buffer 64. The directory stream fills the 
directory Input buffer to capacity. The directory decoder 
67 then removes the first directory entry from the 

15 directory Input buffer, and Instructs the video tape 
recorder to skip to the address Indicated by the first 
directory entary. At that address, the video tape recorder 
reproduces the video stream of the picture at the access 
point, located at that address at the beginning of the 

20 zero-th Group of Pictixres. The video stream of the 
picture is then decoded for display. 

The directory decoder then removes the second 
directory entry from the directory input buffer, and 
Instructs the video tape recorder to skip to the address 

25 indicated by the second directory entry. At that address. 
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the video tape recorder reproduces the video stream of the 
picture at the access point/ located at that address at 
the beginning of the first Group of Pictures. The video 
stream of the picture is then decoded for display. 
5 The process just described repeats until the directory 

decoder has removed the tenth directory entiry from the 
directory buffer and the pictxxre at the access point at 
the beginning of the ninth Group of Pictures has been 
reproduced and displayed. The directory buffer 64 is now 

10 enipty, and, if the directory decoder 67 attempted to 
remove another directory entry, it would cause the 
directory input buffer to underflow. However, the next 
directory packet is located before the next access point. 
The video tape recorder reproduces the directory stream 

15 from the directory packet and transfers it into the 
directory input buffer, which, being empty, can 
accommodate the whole of the directory stream in the 
directory packet. The directory decoder then removes the 
first directory entry from the directory input buffer, and 

20 instructs the video tape recorder to skip to the address 
indicated by the first directory entry. At that address, 
the video tape recorder reproduces the video stream of the 
pictTire at the access point, located at that address at 
the beginning of the tenth Group of Pictiires. The video 

25 stream of the picture is then decoded for display. This 



wo 94/30014 




PCT/JPM/00M2 



79 

process repeats until the £ast-forward process stops. 

Tlie encoder 1 according to tlie invention has used the 
modified system target decoder 4 according to the 
invention to size and place the directory packets in the 
5 multiplexed bit stream so that at no time during the 

fast- forward process does the decoder 6 have to attempt to 
remove directory entries from an empty directory ix^ut 
buffer (which would result in an underflow of the 
directory input buffer) or to fill the directory ixvnit 

10 buffer with directory stream when the directoicy inxmt 

buffer is not empty (which would result in an overflow of 
the directory input buffer. 

Figrure 21 shows a second embodiment of the digital 
video and audio signal processing system lOA according to 

15 the invention, in which a time staiz^ buffer and a time 
stas^ decoder is provided in the modified system target 
decoder 4A according to the invention for each of the 
atidio time stamps and the video time stamps. 

Using the modified system target decoder 4A according 

20 to the invention, the encoder lA is able to optimize the 
system video stream buffering delay and other encoding 
parameters to generate compliant bit streams with the best 
possible picture quality for the required video bit rate, 
while keeping the decoder buffering delays as low as is 

25 practical in a one-pass system. 
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in tlie system shown in Figure 21, the encoder lA 
receives the video signal S2 from the video signal storage 
medium 2, and receives the audio signal S3 from the audio 
signal storage mediinn 3. The audio sigxial S3 could 
5 alternatively be (and is more usually) also received from 
the video signal storage medium 2 instead of from a 
separate audio storage medium. 

The encoder lA congresses and codes the video and 
audio signals, and multiplexes the resulting audio stream 

10 and video stream to provide the multiplexed bit stream 
SIA, vrtiich is fed to the medium 5 for storage or 
distribution. The medium can be az^ medium suitable for 
storing or distributing a digital bit stream, for exaaiple, 
a CD-ROM, a laser disk (LD), a video tape, a 

15 magneto-optical (UO) storage medium, a digital compact 
cassette (DCC), a terrestrial or satellite broadcasting 
system, a cable system, a fibre-optic distribution system, 
a telephone system, an ISDN system, etc. 

The encoder lA conqpresses and codes the video signal 

20 picture-by-picture. Each picture of the video signal is 
con^ressed as an I-picture, a P-picture or a B-picture as 
described above* 

The digital video and audio processing system lOA also 
includes the decoder 6A, which receives as its input 

25 signal the bit stream S5A from the medium 5. The decoder 
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6A performs demultiplexing inverse to tlie multiplexing 
performed by the encoder lA. The decoder performs 
processing conq;>lementary to that performed by the encoder 
lA to decode the resulting audio stream and video stream 
5 to provide the recovered video signal S6A and the 

recovered audio signal S6B. The recovered video signal 
S6A and the recovered audio signal S6B respectively 
closely match the video signal S2 and the audio signal S3 
fed into the encoder lA. 

10 Figure 21 also shows the system target decoder (STD) 

4A which is used to define the processing characteristics 
of the encoder lA and the decoder 6A. In practical video 
and audio signal processing systems, the encoder does not 
include an actual system target decoder, but instead 

15 performs the encoding processing and multiplexing talcing 
account of the system target decoder parameters. Also, 
practical decoders are designed taking the system target 
decoder parameters into accoxint to minimize hardware cost, 
etc* These relationships between the system target 

20 decoder and the encoder and the decoder are indicated in 
Figure 21 ^ the broken line labelled S4A interconnecting 
the system target decoder 4A and the encoder lA, and the 
broken line labelled S4B interconnecting the system target 
decoder 4A and the decoder 6A. 

25 The system target decoder 4 includes a reference video 
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decoder 45, a reference audio decoder 46, and tbeir 
respective input buffers 42 and 43. In addition, the 
system target decoder includes a video time stamp 
processing module 55, an audio time stamp processing 
5 module 56, and their respective input buffers 52 and 53. 
The size of the audio input buffer, the size of the video 
input buffer, and the operation of the audio and video 
decoders are defined by the MPEG standards, as described 
above. In addition, the invention defines the sizes of 
10 the video time stanqs) buffer and the audio time stamp 

buffer, and the time stamp coding frequency • The size of 
the time stamp buffers and the time stamp coding frequency 
are defined to optimize the utilization of the other input 
buffers • 

15 Again, as discussed above, the concept of the modified 

system target decoder according to the invention provides 
con^tibility between encoders and decoders of different 
designs not only with respect to the audio and video 
streams, but also with respect to the audio and video time 

20 stas^ buffering. In particular, the modified system 

target decoder according to the invention provides this 
ccmpatibility without the need to istpose a maxIniUTn on the 
buffering delay. This enables the scox>e of the MPEG 
standard to be extended to cover such applications as low 

25 bit-rate video slide shows and the like. 
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The stinicture o£ the l^x>othetical system target 
decoder 4A shown in Figixre 21 is as follows. The 
demultiplexer 41A notionally receives the bit stream SIA 
from the encoder lA. The demultiplexer 41A demultiplexes 
5 the bit stream into a video stream SIV, an audio stream 

SIA, video time stanqps VTS and audio time staxnps ATS. The 
video stream SIV is fed to the video input buffer 42, the 
output of which is connected to the video decoder 45. The 
audio stream from the demultiplexer 41A is fed into the 

10 audio input buffer 43, the output of which is connected to 
the audio decoder 46. The video time stamps from the 
demultiplexer 41A are fed into the video time stamp buffer 
52, the output of which is connected to the video time 
BtasBp processing module 55. The video time stasv 

15 processing module controls the timing of the decoding of 
the video stream Ijy the video decoder 45. The audio time 
stamps from the demultiplexer 41A are fed into the audio 
tixae stamp input buffer 53, the outxnxt of which is 
connected to the audio time stasgp processing module 56. 

20 The audio time stas^ processing module controls the timing 
of the decoding of the audio stream by the audio decoder 
46. 

In the eyample shown in Figure 21, the video input 
buffer 42 and the audio input buffer 43 have the 
25 respective storage capacities defined by the MPEG 
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Standards, namely, 46K l^es and 4R IsytBS In the MPE6-1 
standard. These capacities are set in consideration of 
the practical constraints imposed by providing the decoder 
6A using a processor that, because of cost constraints, 
5 cannot have a large amount of storage. 

The video decoder 45 removes the video stream from the 
video input buffer 42 one video access unit at a time, 
i.e., one picture at time, at a timing corresponding to 
the video time stasqps and the picture rate of the video 

10 signal, e.g., once every 1/29.94 seconds in an NTSC 

system. The axaount of the video stream removed from the 
video input buffer for each picture varies because of the 
different amount of compression applied to each picture. 
The audio decoder 46 removes the audio stream from the 

15 audio input buffer 43 one audio access unit at a time at a 
timing corresponding to the audio time stamps and a 
predetermined timing. 

The structure of the encoder lA is shown in Figure 
22A. Access units of the video signal S2 are fed to the 

20 input of the video encoder 201A, which conqpresses each 
access unit, i.e., picture, of the video signal. The 
resulting access unit of video stream is fed from the 
output of the video encoder to the input of the video 
output buffer 300, where they it temporarily stored. The 

25 video stream from the output of the video output buffer is 
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fed to the multiplexer 203A. Feedback £ram the video 
output Iniffer to the video encoder prevents the output of 
the video encoder from causing the video output buffer to 
overflow. 

5 The audio signal S3 is fed to the ix^>ut of the audio 

encoder 2 02 A, which coopresses it. The resulting audio 
access units are fed from the output of the audio encoder 
to the input of the audio output buffer 302, vOxbtb they 
are temporarily stored. The audio stream from the output 

10 of the audio output buffer is fed to the multiplexer 203A. 

Feedback from the audio output buffer to the audio encoder 
prevents the output of the audio encoder from causing the 
audio output buffer to overflow. 

The encoder lA also includes the clock signal 

15 generator 305. In the MPE6-1 systems, the frequency of 
the clock signal generator is 90 kHz, in MPE6-2 systems, 
the frequency is 27 MHz. The output of the clock signal 
generator is fed to the clock counter 307, the output of 
which provides a clock reference signal. The clock 

20 reference signal has a value that is incremented by one 
each cycle of the clock sigxial. The clock reference 
signal is connected to the header generator 204. In the 
MPE6-2 standard, the clock counter 307 also divides HPEO-2 
clock signal by 300 to provide a time stamp clock 

25 reference signal having a value that is incremented by one 
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at a rate o£ 90 kHz. The clock coxinter feeds tbe time 
stamp clock reference signal to the video decoding time 
stasv generator 309 # the video presentation time stamp 
generator 311# *»'"^ the audio presentation time stamp 
5 generator 313. In MPBG-1, the clock counter 307 feeds the 
clock reference signal to the video decoding time stamp 
generator 309 # the video presentation time stan^ generator 
311, and the aiidlo presentation time stamp generator 313 
as the time stamp clock reference signal. 

10 The video input signal 82 is also fed to the input of 

the video presentation time stamQ generator 311. The 
video presentation time stasqp generator generates a 
presentation time stanrp (PTS) in response to each picture 
of the video input signal and the time stamp clock 

15 reference signal. The presentation time stasgps are fed 
via the time stamp re-ordering buffer 304 to the video 
time stamp buffer 301. Each video presentation time stair^ 
is the value of the time stanrp clock reference signal at 
the instant the video encoder receives the start of a 

20 picture of the video input signal. 

The time stasqp re-ordering buffer 304 receives a 
re-order flag signal from the video encoder 201A each time 
the latter, in the course of compressing the video input 
signal S2, changes the order of the access units of the 

25 video stream relative to the order of the access iinits of 
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tlie video input signal 82. In response to the re-order 
flag signal, the time stamp re-ordering buffer changes the 
order of the presentation time stanqps generated hy the 
video presentation time stamp generator 311 to match the 
5 order of the access units of the video stream the video 
encoder feeds into the video output buffer 300. 

The video encoder 201A feeds a flag signal to the 
input of the video decoding time Btaxap generator 309 at 
the same instant as it feeds the start of an access unit of 

10 the video stream to the video output buffer 300. In 
response to each flag signal and the time stamp clock 
reference signal, the video decoding time staaq) generator 
generates a video decoding time stas^ (video DTS) , ^shich 
it feeds to the video time Btamp buffer 301. The video 

15 decoding time stan^ is the value of the time staii^ clock 
reference signal at the instant the flag signal indicates 
that the encoder has fed the start of the access unit of 
the video stream into the video input buffer. 

The video time stamp buffer 301 temporarily stores the 

20 video time stamps. The video time stamp buffer also 

receives and stores pointers from the video encoder 201A 
to enable it to relate each video time stanqp that it 
receives to the pictixre header of each video access unit 
stored in the video output buffer 300. The video time 

25 stamp buffer later feeds the video time stamps to the 
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multiplexer 203A. The video decoding time atanps are fed 
to the scultiplexer via the adder 319, where they are 
incremented by the value of the SELECTED V BUFFERING DELAY 
(which will be described in more detail below) . The video 
5 presentation time stasqps PTS are fed to the multiplexer 

via the adder 321, where they are incremented by the value 
of the total video delay (which will be described below) . 
The multiplexer selectively adds the video time stamps to 
the packet headers of the video packets in the multiplexed 

10 bit stream according to the occupancy of the video time 
stamp buffer 42 of the system target decoder 4A. 

The audio encoder 202A feeds a flag signal to the 
input of the audio presentation time stamp generator 313 
coincident with it feeding the start of each access omit 

15 of the audio stream to the audio output buffer 302. In 
response to this flag signal and the time stamp clock 
reference signal, the audio presentation time stamp 
generator generates an audio presentation time stas^, 
which it feeds to the audio time stamp buffer 303. Each 

20 audio presentation time stamp is the value of the time 
stamp clock reference signal at the instant the flag 
signal indicates that the audio encoder has fed an access 
unit of the audio stream into the audio input buffer. 

The audio time stas^ buffer 302 ten^rarily stores the 

25 audio presentation time stasqps. The audio time stamp 
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bu££er also receives pointers £rom ttie audio encoder 202A 
to enable it to relate each audio tiine stamp that it 
receives to the address o£ the header o£ each audio access 
unit stored in the audio output bu££er 302. The audio 
5 time staxnp buffer 303 later feeds the audio presentation 
time stamps to the multiplexer 203A. The xmiltiplexer 
selectively adds the audio time stamps to the packet 
headers of the audio packets in the multiplexed bit stream 
according to the occupancy of the audio time staiz^ buffer 

10 43 of the system target decoder 4A. 

The video output buffer 300 « video time stamp buffer 
301, audio output buffer 302, audio time stamp buffer 303 
and time stamp re-ordering buffer 304 are all f irst*in 
first-out (FIFO) buffers. 

15 The time stamp generators 309, 311, and 313 may be 

integrated with their respective video and audio time 
staa^ buffers 301 and 302. Moreover, a single clock 
reference signal could be used, and could be divided by 
-300 in the time stamp generators to provide the time stamp 

20 clock reference signal . 

The header generator 204 generates the various headers 
of the multiplex layer, i.e., the pack headers and the 
various packet headers. The header generator receives the 
clock reference from the clock coiinter 307, and feeds the 

25 headers into the multiplexer 203A. 
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Figure 23 allows the structure of the decoder 6A in the 
encoding/decoding system IDA. The decoder 6A is designed 
in consideration of the parameters of the system target 
decoder 4 A (Figure 21} to decode the multiplexed bit 
5 stream produced by the encoder lA. As a result, the 

decoder 6A has a structure very similar to that of the 
system target decoder 4A. 

The decoder 6A includes the demultiplexer 61A, vdiich 
receives the multiplexed bit stream 85 from the mediinn 5. 

10 The demultiplexer demultiplexes the multiplexed bit stream 
into the video stream SSV, the audio stream S5A, the video 
time stamps S5TV and the audio time stamps S5TA. 

The video stream S5V from the output of the 
demultiplexer 61 is fed into the video input buffer 62, 

15 which precedes the video decoder 65. The audio stream S5A 
from the demultiplexer is fed into the audio input buffer 
63, which precedes the audio decoder 66, The video time 
stasis S5TV from the demultiplexer is fed into the video 
-time stamp buffer 72. The video time stasis are read out 

20 from the video time stamp buffer into the video time stas^ 
processing module 75, which controls the timing of the 
decoding of the video access units in the video stream S5V 
by the video decoder 65. The audio time stamps S5TA from 
the demultiplexer are fed into the audio time stamp buffer 

25 73. The audio time stamps are read out from the audio time 
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Btas^ buffer into the audio time staxnp processing module 
76, which controls the timing of the decoding of the audio 
access units in the audio stream S5A by the audio decoder 
66. 

5 The video decoder 65 removes each access unit, i.e., 

picture, of the video stream from the video input buffer 
62 for decoding in the order in which the access unit was 
received by the video input buffer. The audio decoder 66 
removes each access unit of the audio stream from the 
10 audio input buffer 63 for decoding in the order in which 
the access unit was received by the aizdio input buffer. 

The operation of the encoding and decoding system IDA 
described above will now be described. 

If still pictures are encoded, the MPEG 2 standard 
15 requires that: 

- each still picture have an associated time staiiv> 
that determines how long the picture will be displayed; 

- each still picture be displayed for at least 2 
picture periods. Consequently, the mayiminn still picture 

20 rate is, e.g. 25 Hz/2 = 12.50 Hz for PAL display devices, 
and 29.97 Hz/2 = 14.99 Hz for NTSC display devices; and 
- still picture video consist only of l-pictures. 
Consequently, decoders receiving the bit stream from 
the encoder must buffer and use all video time stasis to 
25 reconstruct a still picture video bit stream with the 
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correct timing. In an actual decoding system according to 
tlie Invention, a separate video time stan^ buffer is used 
for this ptirpose. To allovr relatively small time stamp 
buffers to be used for this purpose and to gxiarantee that 
5 such time stan^ buffers will never overflow, the system 

target decoder according to the invention also Includes a 
video time stasqp buffer (or a functionally-equivalent 
parameter constraint) ^^ch affects certain parameters of 
the encoding system. 

10 Using the arrangement shown in Figure 22B, the 

one-paBS encoder shown in Figure 22A can configure itself 
to comply with the constraints of this model in addition 
to being capable of configuring itself to encode a normal 
full -motion video signal. 

15 Referring to Figures 22A and 22B, to comply with the 

STD video time stamp buffer constraint, the encoder lA 
first determines, at block 351, the STD video stream 
buffering delay that will prevent the STD video time stamp 
buffer 52 from overflowing. This value will be called 

20 DEIiAY THAT WOBKS. 

DKIiAY THAT WORKS = 

size of STD time stan^ buffer 52/time stamp coding 
frequency. 

In a system with a relatively low video bit rate 
25 (e.g., in many still picture applications), a buffering 
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delay longer than the value of DEliAY TH^T WORKS is 
necessary for optimum picture quality. Therefore, in such 
a system, the time stamp coding frequency is reduced as 
much as possible (as is allowed for still-picture video by 
5 the MPE6-2 standard) . Using locked encoding systems helps 
achieve this goal. Alternatively, the size of the video 
time stamp buffer 52 in the system target decoder may be 
increased to provide a longer delay. As a further 
alternative, both the time stasrp coding frequency may be 
10 reduced and the STD video time stanqp buffer size may be 
increased* 

For exasqple, for still picture video using, e.g., a 50 
Hz display device, the encoder will calculate the time 
stas^ coding frequency tscf using the formula: 
15 tscf 12.5/K 

(N is a positive integer) 
Since the UPEO-2 standard requires that one time stamp 
be provided for each still picture, when used for 
generating a bit stream representing still picture video, 
20 the video encoder 201A will also generate I -pictures at a 
reduced rate, i.e., at the rate of 12.5/N Hz, if the time 
stamp coding frequency is reduced. The value of N is set 
by the encoder operator. 

Block 353 determines the video stream buffering 
25 delay that is needed to generate the worst case (i.e., the 
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largest possible) picture using tlie size of the STD video 
input buffer 42. This value will be called DELAY FOR BIG 
PICTDRE. 

DELAY FOR BIG PICTURE - 
5 size of STD video input buffer 42 /bit rate of the 

video stream. 

In practice, to make the video bit stream "safe" for 
all decoders, the encoder lA may use a value smal ler than 
the actual size of the system target decoder video input 
10 buffer 42 in the above formula. 

The value of DELAY FOR BIG PICTURE can easily be 
longer than one second in systems in v^ch the video bit 
rate is relatively low. 

Block 357 conqpares DELAY FOR BIG PICTURE with DELAY 
15 -THAT WORKS to determine the value of the selected decoder 
video buffering delay (SELECTED V BUFFERING DELAY) • If 
DELAY FOR BIG PICTURE DELAY THAT WORKS, the encoding 
system will set the value of SELECTED V BUFFERIEG DELAY 
to DELAY FOR BIG PICTURE. 
20 In some applications, DELAY FOR BIG PICTURE will be 

larger than DELAY THAT WORKS. In this case, to satisfy 
all STD constraints, the encoder will set the value of 
SELECTED V BUFFERIKG DELAY = DELAY THAT WORKS. 

The value of SELECTED V BUFFERING DELAY iS fed to the 
25 adder 319 and to block 363. 



wo 94/30014 




PCT/JP94/00942 



95 

Block 359 calculates tlie memory qiiantity video output 
buffer size required for the video output buffer 300. The 
memory ciuantity video output buffer size is calculated 
using the SELECTED V BUFFERING DEIAY and the available 
5 video bit rate as follows: 

video output buffer size (bytes) s 

SELECTED V BUFFERING DELAY * available video bit 

rate /8. 

Block 359 feeds the value of video output buffer size 
10 to the video output buffer 300. 

Block 361 calculates the memory quantity video time 
stamp-buffer size reoruired for the video time stamp buffer 
301. The memory quantity required is that which will hold 
the nunft>er of presentation time stamps (PTS) and decoding 
15 time stamps (DTS) given by: 

SELECTED V BUFFERING DELAY * time Stamp coding 
frequency. 

- Block 361 feeds the value of video time stamp buffer 
size to the video time stamp buffer 301. 

20 At blocks 363, 365 and 367, the encoder calculates the 

audio encoder buffering delay (from which the audio output 
buffer size and the audio time staii^ buffer size are 
calculated) from the total video delay and the audio 
decoder buffering delay. To achieve end-to-end 

25 synchronization between audio and video, the end-to-end 
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delays of tlie video stream and the audio stream tlirouglx 
the encoder and the decoder must be equals as shown in 
Figure 24B. 

Figure 24A shows the components of the end-to-end 
5 system delay total video delay of the video stream, which 
is calculated in block 363. This delay is called the 
total video delay. 

total video delay s 

SEIiECTED V BUFFERING DEIiAY -i- SELECTED V REORDERING 
10 DELAY. 

The value of the SEItECTEP V REORDERING DELAY (SVRD) , 
which also affects picture quality, is usually one or more 
picture periods. The SELECTED V REORDERING DELAY is the 
sum of two cox^ponents, namely, the video encoder 
15 reordering delay (verd) and the video decoder reordering 
delay (vdrd) • In this exan^le, verd is assumed to be 
zero, and vdrd is set to one pictiire period. 
Consequently, the SELECTED V REORDERING DELAY is one 
- picture period. 
20 The SELECTED V BUF FER ING DELAY iS also the sum of two 

coztqponents , namely, the video encoder buffering delay 
(vebd) and the video decoder buffering delay (vdbd) . 

The valujs of total video delay calculated by the block 
363 is fed to the adders 321 and 323, and to the block 367. 
25 The audio input buffer 43 of the system target 
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decoder 4A is relatively small # and the audio decoder 46 
removes tlie audio stream from the audio input buffer at a 
relatively constant rate. Furthermore, the audio access 
units are not reordered. Block 365 calculates the audio 
5 decoder buffering delay (adbd) of the audio stream in the 
STD as follows: 

audio decoder buffering delay » 

size of STD audio input buffer 43 /audio bit rate^ 
In practice, to make the audio bit stream "safe" for 
10 all decoders, the encoder may use a value sm a l ler than the 
actual size of the system target decoder audio input 
buffer 43 in the above formula. 

The audio decoder buffering delay is small compared 
with the total video delay. As a restxlt, the audio 
15 decoder buffering delay (adbd) calculated by block 365 is 
usually relatively short. To provide the required 
end-to-end synchronization between audio and video, it is 
not usually possible to reduce the total video delay 
' because of picture quality requirements. Therefore, the 
20 block 367 calculates from the total video delay and the 

audio decoder buffering delay a value of the axidio encoder 
buffer delay (aebd) that is sufficiently large to make the 
total audio delay match the total video delay, as shown in 
Figure 24B. 

25 To provide the audio encoder buffering delay aebd 
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calculated by block 367, block 369 calculates the memory 
quantity audio output buffer size required for the audio 
output buffer 302 as follows: audio output buffer size 
(bytes) = audio encoder buffering delay 
5 * audio bit rate /8 

The block 369 feeds the value of aizdio output buffer 
size to the audio output buffer 302. 

Block 371 calculates the memory qxiantity audio time 
stas^ buffer size (in time staxqpsj required for the audio 
10 time stamp buffer 303 as follows: 

audio time stass) buffer size (time stamps) = 
audio encoder buffering delay 
* audio access unit rate. 
The block 371 feeds the value of audio time stamp 
15 buffer size to the audio time staxxqp buffer 303. 

The above encoder set up procedure was described with 
reference to a low bit-rate application. A similar 
procedure can be used to set up the encoder lA for normal 
full -motion video, or for applications, such as 
20 professional video applications, in which a very short 
buffering delay (e.g., about 0.2 s) is required. 

Returning now to Figure 22A, after the encoder has 
calculated the parameters just described, and has used 
these parameters to set up the video output buffer 300, 
25 the video time stamp buffer 301, the audio output buffer 
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302, tlie audio time stanqg) buffer 303 and the adders 319, 
321, and 323, the encoder operates with these parameters 
to encode the video input signal S2 and the audio input 
signal S3 as follows. The video encoder 201A and the 
5 audio encoder 202A start encoding their respective input 
signals at the same time. Once the encoding process has 
started, and iintil the end of the respective input signals 
S2 and S3, the video encoder 201A will generate video 
access units at the selected picture rate and feed th e m to 

10 the video output buffer 300, and the audio encoder 202A 
will generate audio access units (AAIT) dei>ending on the 
selected audio sasqpling rate and number of samples per 
AAU, and feed them to the audio output buffer 302. The 
video encoder 201A includes a rate control mechanism 

15 (indicated by the path connecting the video output buffer 
and the video encoder) that prevents overflow of the video 
output buffer 300. By preventing overflow of the video 
output buffer having a size set according to the value of 
video-output buffer size, as described above, the video 

20 encoder 201A executes one of the tasks necessary to make 
the multiplexed bit stream SlA con^liant with the 
constraints imposed by the system target decoder 4A. 

During the encoding process, the 3 3 -bit clock 
reference signal from the clock counter 307 continuously 

25 increments at the rate of 90 kHz in an MPE6-1 systCTi, or at 
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27 HHz in an HPEG-2 systefm. Also, in an MPE6-2 system, tlie 
33-bit time staix^ clock reference signal increments at the 
rate of 90 kHz. 

Each time the beginning of an access unit of the video 
5 input signal S2 arrives at the video encoder 201A, the 
video PTS generator 311 determines the value of the time 
stasqp clock reference signal from the clock counter 307 as 
a video presentation time stamp (PTS) . The video PTS 
generator feeds tlie PTS to the time stas^ re-ordering 

10 buffer 304, where it is tenqporarily stored. The PTS is 
associated with the address of the picture header of the 
corresponding video access unit in the re-ordering buffer 
]V# for example, a pointer received from the video 
encoder. If, in encoding the video input signal, the 

15 video encoder reorders a video access unit of the video 

input signal S2, the video encoder feeds the re-order flag 
to the time-stamp reordering buffer, in resx>onse to the 
re-order flag, the time stamp re-ordering buffer re-orders 
the PTS belonging to that access unit. In other words, 

20 the time-stastp re-ordering buffer re-orders the PTSs so 
that their order at the output of the time stamp 
re-ordering buffer 304 is the same as the order of video 
access units at the output of the video encoder 201A. The 
time- stamp re-ordering circuit feeds the video 

25 presentation time staiigps to the video time stamp buffer 
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301. 

Each time tbe video encoder 201A feeds an access unit 
of ttie video stream into the video output buffer 300, tbe 
video DTS generator 309 determines the value of the time 
5 stamp clock reference signal from the clock counter 307 as 
the video decoding time stasqp (video DTS) of that video 
access unit. The video DTS generator feeds the video DTS 
to the video time stamp buffer 301, vdiere it is stored 
together with the PTS from the time-stan^) re-ordering 

10 buffer 304. Together with the video time stands, the 

video output buffer also receives from the video encoder 
201A and stores a pointer that ixidicates the address in 
the video output buffer 300 of the picture header of the 
video access unit to which the time stasips belong. 

15 Each time the audio encoder 202A feeds an access unit 

of the audio stream into the audio output buffer 302, the 
audio PTS generator 313 determines the value of the time 
stamp clock reference signal from the clock counter 307 as 
the audio presentation time stamp (audio PTS) of that 

20 audio access unit. The audio PTS is stored in the audio 
output buffer 303, together with a pointer indicating the 
address in the audio output buffer 302 of the header of 
the access unit to which the audio time sta2i^ belongs. 

To generate the correct time staii^ values, except for 

25 the picture reordering delay, the video encoder 201A and 
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tlie audio encoder 201A theoretically produce access units 
instantaneously, and without delay. Consequently, for 
certain pictiires, the video PTS and the video DTS stored 
in the time stamp buffer will have exactly the same 
5 values. Because real hardware in^lementations operate 

with delays, these delays must be taken into account when 
the time stamps are generated. For example, the time 
stamp generators 309, 311 and 313 can provide time stamp 
values that are additionally incremented to take account 

10 of real processing delays. 

When the beginning of the video stream enters the 
video output buffer 300, the header generator 204 
generates a header, which it feeds to the multiplexer 
203A. The header generator receives the clock reference 

15 signal from the clock counter 307, and includes in the 

clock reference field of the header the value of the clock 
reference signal at the instant that the head of the video 
stream entered the video output buffer. 

Next, the header generator 204 generates the video 

20 packet header for the first video packet of the 

multiplexed bit stream, and feeds the video packet header 
to the multiplexer 203A. The video packet header includes 
a length field, the value of which depends on the nuxnber 
of bytes of video stream that will follow the video packet 

25 header. The video packet length depends on the 
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application, and on the multiplexing strategy. 

If the video packet Includes an access xinit header, 
the video packet header may also Include a time stamp. 
Whether the video packet header is to include a time stamp 
5 can be determined by checking the video stream to be 

inserted in the video packet (which depends on the current 
read pointer to the video output buffer 300 and the video 
packet length) and by checking whether the pointer stored 
in video time stamp buffer 301 points to this segment of 

10 the video stream. Also, the multiplexer performs 

processing that emulates tracking the state of occupancy 
of the video time staaqp buffer 52 in the system target 
decoder. If adding a time stastp to the video packet 
header would cause the video time stamp buffer to 

15 overflow, the multiplexer will not add a time stamp. On 
the other hand, if the video time atamp buffer is close to 
ex^pty, the multiplexer may begin a new video packet so 
that a time stasrp can be added to the multiplexed bit 
stream. Xn the manner just described, the multiplexer 

20 prevents the video time sta2xgE> buffer from overflowing or 
iznderf lowing. Similar processing is carried out to 
prevent the audio time stas^ buffer 53 from overflowing or 
underf lowing . 

The decoding time stamps and presentation time stamps 
25 are respectively fed from the video time stamp buffer 301 
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into the multiplexer 203A via the adders 319 and 321. The 
adder 321 increments each presentation stamp by the value 
of the total video delay calculated hy the total video 
delay calculation circuit 363 as described above, and the 
5 adder 319 increments each decoding time stan^ by the 

SEIiECTED V BUFFERING DEIiAT calculated by the SEIiECTED V 
BUFFERING DELiAT calculating circuit 357 as described 
above. X£ the incremented PTS and the incremented DTS 
have different values, the multiplexer 203A will insert 

10 both of them into the video x>acket header. If the 

incremented PTS and the incremented DTS have the same 
value (i.e., when the picture is a B-picture) only one 
time stamp is inserted into the video packet header. 

When the video input sigzial S2 is a full-motion video 

15 signal, the multiplexer 203A will read the video stream 
for the video packet from the video output buffer 300 and 
insert it into the multiplexed bit stream SlA after 
con^leting the video packet header. While the video 
stream is being read from the video output buffer 300, the 

20 read pointer to the video output buffer 300 is compared 

with the oldest pointer in the time stamp buffer 301 that 
points to the address of one of the picture headers stored 
in the video output buffer 300. When these pointers are 
equal, the PTS, DTS and associated pointer will be removed 

25 from the video time stas^ buffer 301. This happens when 
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the video packet includes more than one picture header. 
When the video input signal S2 is an MPEG- style still 
. picture video signal, because each picture must have an 
associated time stamp, the encoder will insert a new video 
5 packet header including time stamps just before each 
picture header. 

The encoder will reduce the size o£ a video packet 
and/or stop inserting new video packets into the 
multiplexed bit stream for a number of reasons, incl ud i n g; 
10 1. to insert an audio packet into the multiplexed bit 

stream; 

2. the video output buffer 300 is empty; or 

3. there is no more video stream. 

Case 1 occurs at regular intervals that are shorter 
15 than the audio decoder buffer delay adbd. The first audio 
packet will not be inserted into the multiplexed bit 
stream until the audio encoder buffer delay time aebd has 
elapsed. However, dummy audio packets (or other useful 
information included in packets with the same size as 
20 audio packets) may be inserted into the multiplexed bit 
stream instead of audio packets before this time has 
elapsed. This maintains the video bit rate at the 
intended video bit rate, and prevents a ten^rary increase 
in the video bit rate that xoay violate the STD buffering 
25 constraints . 
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A£ter tihe audio encoder buffer delay tiiaie aebd liae 
elapsed, an actual audio packet is generated, and the 
header generator 204 will generate an audio packet header. 
If the audio packet includes an audio access unit header, 
5 the audio time stamp buffer 303 will feed the oldest audio 
PTS stored therein to the multiplexer 203A, and the 
multiplexer will include the PTS in the audio packet 
header. The audio PTS is fed via the adder 323, which 
increments the oldest audio PTS by the value the total 

10 video delay calculated by the total video delay 
calculating circuit 363, as described above. 

As the multiplexer 203A transfers the audio stream 
from the audio output buffer 302 to the multiplexed bit 
stream SIA, the audio time stanqp buffer 303 will discard 

15 those time stamps whose pointers point to addresses in the 
audio output buffer equal to the read pointer of the audio 
output buffer 302. 

Audio x>ackets will continue to be generated until all 
- the aiidio stream generated by the audio encoder 202A from 

20 the audio input signal has been inserted into the 

multiplexed bit stream SIA. If, after this, any other 
elementary stream data needs to be transmitted, this 
stream data can be inserted into the multiplexed bit 
stream SIA. Otherwise, dismmy packets are again inserted 

25 into the multiplexed bit stream SIA at regular intervals 
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instead of actiial audio packets in order to maintain the 
intended video bit rate. 

Concerning case 2, in constant bit rate systems, the 
video encoder 201A monitors the occupancy of video output 
5 buffer 300, and can usually prevent the video output 

buffer 300 from becoming empty. The video encoder can 
generate additional video stream to refill the video 
output buffer by reducing the video compression ratio when 
the video output buffer approaches enpty. If # despite 

10 such measures, the video output buffer 300 does became 
empty, the multiplexer 203A can include other useful 
information in the multiplexed bit stream SlA izistead of 
the video stream. If such useful information is not 
available, the multiplexer can include stuffing bits in 

15 the multiplexed bit stream to maintain the target bit 
rate. 

In a variable bit rate system, the nuxltiplexer 203A 
can simply wait until it is time to write an audio packet 
• or, if it is too early to write an audio packet, it can 
20 wait until a new video access unit enters the video output 
buffer 300. This can then lead to generation of a new 
video packet. 

Case 3. occurs when all the video input signal 82 has 
been converted into the multiplexed bit stream SlA. The 
25 encoder xnay continue to generate other packets if data 
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streains for sucli packets are still to be inserted in SIA. 

Figure 25 illustrates the operation of the decoder 6A 
with a low bit rate multiplexed bit stream. The low bit 
rate multiplexed stream shown in Figure 25 does not comply 
5 with the MPEG- 2 still picture video requirements set forth 
above. The MPEG standard provides a multiplexed bit 
stream including a video stream with a picture rate that 
is an integral fraction of the normal picture rate of 
about 25 or 30 frames per second (the highest picture rate 

10 allowed is one half of the normal picture rate) . The MPEG 
standard leaves it to the decoder to x>erform non-standard 
processing to derive from the multiplexed bit stream a 
video signal with the normal pictures rate for feeding to 
a display device that requires a video signal with a 

15 normal picture rate. The decoder does this by reading out 
each of the decoded pictures stored in its output buffer 
several times at the normal picture rate. The additional 
processing required to decode the video stream with the 
below-normal picture rate increases the complexity and 

20 cost of the decoder. 

Additional con^lexity in the decoder can be avoided by 
providing to the decoder a still picture video stream 
having a normal picture rate. An uncompressed still 
picture video signal consists of consecutive pictures at 

25 the normal picture rate. Consecutive pictures are 
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identical except at the points in the video signal at 
which the picture changes. Such a signal is encoded by 
coding the first picture after a picture change as an 
I-picture. All the other pictures in the video signal are 
5 also coded, but as xainimal P- pictures. The video stream 
resulting from each of such pictures is little more than 
headers, and requires only a few hundred bits. 
ConseQuently, low bit-rate still picture video can be 
provided using a video stream that has a normal picture 
10 rate with only a slight reduction in the number of bits 
available to code the first picture after each picture 
change. 

The structure of the multiplexed bit stream S5A 
received by the decoder 6A from the medium 5 is shown 

15 across the top of Figure 25. The video stream consists of 
plural pictures at the standard picture rate, i,e., 25 or 
30 frames per second. The pictures are grouped into 
groups of pictxxres (GOP) , each of vAiich begins with the 
first picture following a picture change (an I-picture), 

20 followed by a number of P-pictures. The number of 

P-pictures corresponds to the nusiber of normal picture 
periods between each picture change in the still picture 
video signal, in the example shown, to nine picture 
periods. The GOPs are included in the video stream so 

25 that each GOP is preceded by a video packet header 
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lnclu<iing time stcunps. 

Figure 25 also shows, in the upper bit index cixrve, 
the bit index of the video input buffer 62 and, in the 
lower bit index curve, the bit index of the video time 
5 Btas^ buffer 72. 

At the l3eginning of the video stream, the time Btamp 
in the first video packet header is fed from the 
demultiplexer 61A into the video time stamp buffer 72. 
Once the video packet header has been demultiplexed, 

10 the video stream of the first picture accumulates in the 
video input buffer at a substantially constant rate (the 
interruptions in the flow that occur each time an audio 
packet is fed into the audio- input buffer 63 and each time 
a video packet header is demultiplexed have been omitted 

15 for clarity) . The video stream is contained in several 

video packets due to the need to include audio packets at 
regular intervals in the multiplexed bit stream, and the 
requirement that a time stair^ (which requires a video 
packet header) be included in the video stream at least 

20 once every 0.7 seconds. Due to the low input bit rate, it 
takes about one second for the video stream of one 
I-picture to accumulate in the video input buffer 62. 
Then, after the video stream of the first I-picture has 
been stored in the video input buffer, the video streams 

25 of the P-pictxires following the I-picture are fed into the 
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video Input buffer. 

Wlien the picture header of the first picture in the 
video stream following the video packet header including a 
time Stan© is written into the video input buffer, a 
5 pointer to the address of the picture header is written in 
a table In the video time stamp buffer 62. 

During accumulation of the video stream in the video 
input buffer 62, additional time stanqps accumulate in the 
video time staixip buffer 72, as shown in the lower bit 
10 ^T^f^o^^ curve . These time stamps do not cause the video 
time stamp buffer to overflow because the encoder 
controlled the addition of time stamps to the video stream 
in consideration of the occupancy of the video time stasxp 
buffer. 

15 After the initial buffering delay, which allows 

sufficient video stream to accumulate in the video input 
buffer 62, the video stream of the first I-picttxre is 
removed from the video input buffer. In the example 
* shown, the initial buffering delay is four seconds. Once 

20 the initial buffering delay is over, the video decoder 65 
removes access units of the video stream from the video 
input buffer at the norxoal pictxire rate. During removal 
of these video streams from the video input buffer, the 
bit index shown in the Figure changes imperceptibly due to 

25 the small size of these picttxres. The video decoder also 
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Checks the table in the video input buffer using the read 
pointer to the video buffer 62. From the table, the video 
decoder can determine whether the picture has a time stamp 
(in still picture video, all the I-pictures will a time 
5 Stan©, but not all the P-pictxxres will have a time stamp. 
In full motion video, not all pictures will have a time 
stasQ) since the time stan© buffer has insufficient size to 
accommodate a time stamp for every picture) . if the 
picture has a time stasqp, the time staiirp for the picture 

10 will be removed from the video input buffer, and will be 
used to determine the decoding time of the picture. If 
the picture lacks a time stamp, the decoding time will be 
determined by the decoder clock. The resulting decoded 
pictures are fed to the decoder output at the normal 

15 picture rate to provide the still picture display. 

In phase-locked systems, tixae stamps are only required 
to set the start up delays of the audio decoder and the 
video decoder. Because the decoders are locked to a 
common reference, ther^ is no need to use the time stamps 

20 to maintain synchronism between the video decoder and the 
audio decoder. In such a system, the first audio time 
stanqc) and the first video time stamp are respectively used 
to set the audio start up delay and the video start up 
delay. All other time stamps are ignored. 

25 In such a system according to the invention, the 
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system target decoder is defined as follows. Tlie time 
stamp buffers 52 and 53 have a capacity of one time stamp. 
Operation of the video decoder 55 is defined so that it 
removes a time stamp from the video time stamp buffer only 
5 at the beginning of the multiplexed bit stream and at no 
other time. Operation of the audio decoder 56 is defined 
so that it removes a time stasqp from the audio time stamp 
buffer 53 only at the beginning of the multiplexed bit 
stream and at no other time. The video decoder 55 and the 

10 audio decoder 56 are locked to a common clock reference. 

with such a system target decoder, the encoder will 
add the first video time stamp generated and the first 
audio time stamp generated to the multiplexed bit stream. 
Since the decoder removes these time stanips from the time 

15 stanqp buffers. Since the STD will require no more time 
stands, the encoder adds no more time stamps to the 
multiplexed bit stream. This gives the possibility to 
eliminate the time stamp fields from the packet headers » 
allowing the bits saved to be used for other purposes. 

20 The Invention has been described with respect to a 

system in vdilch both audio and video streams are included 
in the multiplexed bit stream. However, the Invention can 
be applied equally well to systems in which either an 
audio stream or a video stream is included in the 

25 multiplexed bit stream without the other. The invention 
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can also be applied to streaxoe resulting from coo^pressing 
other types of information signal. The invention has also 
been described with respect to the MPEG-1 and MPEG-2 
standards, but the invention can be applied equally well 
5 to information streams and bit streams that do not comply 
with the MPEG standards. 
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CLAIMS 

1. A method of generating a bit stream by 
multiplexing non-comqpressed auxiliary information with an 
5 information stream, the information stream being obtained 
by cox^pressing fixed-size xinits of an information signal 
with a varying compression ratio to provide varying- sized 
units of the information stream, the auxiliary information 
being for use in subsequently decoding the information 
10 stream, units of the auxiliary information corresponding 
to the units of the information signal, the method 
comprising the steps of: 

dividing the information stream in time into 
information stream portions; 
15 dividing the non-compressed auxiliary information 

in time into auxiliary information portions; 

interleaving the information stream portions and 
the auxiliary information portions to provide the bit 
stream; and 

20 controlling the information stream dividing, 

auxiliary information dividing, and interleaving steps by 
emulating decoding of the bit stream by a hyx>othetical 
system target decoder including a demultiplexing means for 
demultiplexing the bit stream, a serial arrangement of an 

25 information stream buffer and an information stream 
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decoder, ^rxA a serial arrangement o£ an auxiliary 
information buffer and an auxiliary information processor, 
each serial arrangement being connected to the 
demultiplexing means, the information stream dividing, 
5 auxiliary information dividing, and interleaving steps 
being controlled such that the information stream buffer 
and the auxiliary information buffer neither overflow nor 
underflow. 

2. The method of claim 1, wherein, in the step of 
10 controlling the information stream dividing, auxiliary 
information dividing, and interleaving steps: 

the demultiplexing means receives the bit stream 
and extracts therefrom the information stream and the 
auxiliary information for feeding to the information 
15 stream buffer and the auxiliary information buffer, 
respectively; 

the information stream buffer has first target 

size; 

the auxiliary information buffer has a secoxid 
20 target size; 

the information stream decoder removes the 
varying-sized units of the information stream from the 
information stream buffer at a first target timing; and 
the auxiliary information processor removes the 
25 corresponding fixed-sized units of the auxiliary 
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information from the auxiliary information buffer at a 
second target timing. 

3. The method of claim 2, wherein, in the 
interleaving step: 

5 the bit stream comprises plural layers; and 

the information stream portions and the auxiliary 
information portions are interleaved in the same one of 
the plural layers of the bit stream. 

4. The method of claim 3, wherein the auxiliary 

10 information is directory information for the information 
stream. 

5. The method of claim 4, ^Aerein the information 
stream includes plural access points, and each unit of the 
directory information relates to one of the access points. 

15 6. The method of claim 5, herein: 

in the step of dividing the auxiliary information 
into auxiliary information x>ortions, the directory 
information is divided into a directory packet including a 
. number of \inits of directory information determined by the 
20 second target size; 

in the step of dividing the information stream 
into information stream portions, the information stream 
is divided a set of plural information packets, the set of 
plxiral information packets including a number of access 
25 points equal to the nuinber of units of directory 
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information in tliB directory packet; and 

in the interleaving step, the directory packet is 
interleaved adjacent the set of infozmation packets. 

7 . The method of claim 2 , wherein, in the 
5 interleaving step: 

the bit stream conrprises plural layers; and 
the information stream portions are interleaved in 
a first layer of the bit stream, and the auxiliary 
information portions are interleaved in a second layer of 
10 the bit stream, different from the first layer. 

8. The method of claim 7, wherein the information 
stream comprises plural access units the auxiliary 
information is a set of time stands for decoding the 
access units of the information stream. 

15 9. The method of claim 8, wherein: 

in the controlling step the auxiliairy information 
buffer has an occupancy determined by the second target 
size, the auxiliary information fed from the 
> demultiplexing means, and the auxiliary information 
20 removed by the auxiliary information processor; 

the step of dividing the information stream into 
information stream portions divides the information stream 
into plural information packets; 

the step of dividing the auxiliary information 
25 into aiixiliary information portions divides the set of 
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time stamps into time stamps; 

the step of interleaving the information stream 
additionally includes the step of providing an information 
packet header for each information packet; and 
5 in the step of interleaving the information stream 

portions and the auxiliary information portions, a time 
staxigp is included in the information packet header of ones 
of the information packets selected according to the 
occupancy of the auxiliary information buffer. 
10 10. The method of claim 8, wherein: 

in the controlling step the information stream 
buffer has a first target size, and the auxiliary 
information buffer has a second target size, and the 
auxiliary information buffer has an occupancy determined 
15 by the second target size, 

the auxiliary information fed from the 
demultiplexer, and the auxilieiry information removed by 
the auxiliary information processor; 

the step of dividing the information stream into 
20 information stream portions divides the information stream 
into plural information i»ackets; 

the step of dividing the auxiliary information 
into auxiliary information portions divides the set of 
time stamps into time stamps; 
25 the step of interleaving the information stream 
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additionally includes the step o£ providing an information 
packet header for each information packet; and 

in the step of interleaving the information stream 
portions and the auxiliary information portions, time 
5 stasis are periodically included in the information packet 
header of the information packets at a time stan^ buffer 
frequency; and 

in the controlling step, at least one of the time 
stamp coding frequency and the second target size is 
10 controlled in such a manner that maximizes the occupancy 
of the information stream buffer without causing the 
information stream buffer to overflow. 
11. The method of claim 7, wherein: 

the information stream decoder is one of plural 
15 information stream decoders, the information stream 
decoders being phase locked; and 

the auxiliary information buffer has a size set to 
accommodate one and no more than one unit of the auxiliary 
information. 

20 12. An encoder for generating a bit stream, the 

encoder coo^rising: 

means for compressing fixed-size units of an 
information signal with a varying compression ratio to 
provide varying-sized units of an information stream; 

25 information stream dividing means for dividing the 



wo 94/30014 




PCT/JP!W/00W2 



121 

Infoxxnation stream In time into information stream 
portions; 

auxiliary information dividing means for dividing 
non- compressed auxiliary information in time into 
5 auxiliary information portions, the auxiliary information 
being for use in subsequently decoding tbe information 
stream, units of the auxiliary information corresponding 
to the units of the information signal; 

multiplexing means for sequentially arranging the 

10 information stream portions and the auxiliary information 
portions to provide the bit stream, the multiple x i n g means 
including a control means for controlling the information 
stream dividing means and the auxiliary information 
dividing means by emulating decoding of the bit stream by 

15 a system target decoder including a demultiplexing means 

for demultiplexing the bit stream, a serial arrangement of 
an information stream buffer and an Information stream 
decoder, and a serial arrangement of an auxiliary 
information buffer and an auxiliary information processor, 

20 each of the serial arrangements being connected to the 
demultiplexing means, the control means controlling the 
information stream dividing means and the auxiliary 
information dividing means such that the information 
stream buffer and the axixiliary information buffer neither 

25 underflow nor overflow. 
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13. The encoder of claim 12, wherein: 

the demultiplexing means receives the bit stream 
and extracts therefrom the information stream and the 
axixiliary information for feeding to the information 
5 stream buffer auod the aiixlliary information buffer, 
respectively; 

the information stream buffer has first target 

size; 

the auxiliary information buffer has a second 
10 target size; 

the information stream decoder removes the 
varylng-slzed units of the information stream from the 
information stream buffer at a first timing; and 

the auxiliary information processor removes the 
15 corresponding fixed- sized units of the auxiliary 

information from the auxiliary information buffer at a 
second target timing. 

14. The encoder of claim 12, wherein: 

the bit stream provided by the multiplexing me a n s 
20 comprises plural layers; and 

the multiplexing means arranges the information 
stream portions and the auxiliary Information portions in 
the same one of the plural layers of the bit stream. 

15. The system of claim 12, wherein: 

25 the bit stream provided by the multiplexing means 
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coanprises plural layers; and 

the multiplexing means arranges the time-divided 
portions of the information stream in a first layer of 
the bit stream and arranges the non-conqpressed auxiliary 
5 information in a second layer of the bit stream, different 
from the first layer. 

16 . A system wherein an information signal is 
conrpressed for transfer, together with non-can^ressed 
auxiliary information, to a medium as a bit stream, and 
10 wherein the bit stream is transferred from the medium and 
is processed to recover the information signal by 
expansion, and to recover the auxiliary information, the 
auxiliary information being for use in recovering the 
information signal, the system comprising: 
15 an encoder comprising: 

mea«B for compressing the information signal 
to provide an information stream, fixed- sized 
units of the information signal being compressed 
using a varying conqpression ratio to provide 
20 varying-sized units of the information stream, 

and 

multiplexing means for seczuentially arranging 
time-divided portions of the information stream 
and time-divided portions of the non-compressed 
25 auxiliary information to provide the bit stream 
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for transfer to the mediusif the multiplexing 
means including a control means for determining 
a division of the information stream and of the 
auxiliary information into the respective time- 
divided portions by emulating decoding of the 
bit stream by a system target decoder including 
a demultiplexer means for demultiplexing the bit 
stream, a serial arrangement of an information 
stream buffer and an information stream decoder, 
and a serial arrangement of an auxiliary 
information buffer and an auxiliary information 
processor, each of the serial arrangements being 
connected to the multiplexing means, the 
information stream buffer and the auxiliary 
information buffer each having a size; and 
a decoder coaxgprising: 

demultiplexing means for extracting the 
information stream and the auxiliary 
information from the bit stream transferred from 
the mediim, first input buffer means for 
receiving the auxiliary information from the 
demultiplexing means, 

the first input buffer means having a size 
of at least the size of the auxiliary 
information buffer. 
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means for removing a unit of the auxiliary 
Information from the first Input buffer means, 
second Input buffer means for receiving the 
Information stream from the demultiplexing 
5 means, 

the second Input buffer means having a size of 
at least the size of the Information stream 
buffer, and 

decoder means for removing one of the 
10 varying- sized units of the Information stream 

from the second Input buffer means and for 
expanding the removed unit of the Information 
stream to recover the Information signal. 

17. The system of claim 16, wherein the control means 
15 determines the division of the Information stream and of 

the auxiliary Information Into the respective time-divided 
portions such that the bit stream, when subject to the 
emulated decoding by the system target decoder causes the 
-Information stream buffer and the auxiliary Information 
20 buffer neither to underflow nor overflow. 

18. The system of claim 16, wherein: 

the bit stream provided by the multiplexing means 
has plural layers; and 

the multiplexing means arranges the time-divided 
25 portions of the Information stream and of the non- 
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cos^ressed auxiliary Inf oraoation in the same one of the 
plural layers of the bit stream. 

19. The system of claim 18, wherein the auxiliary 
information is directory information relating to the 

5 information stream. 

20. The system of claim 19, wherein the information 
stream includes plural access points, and each unit of the 
directory information relates to one of the access points. 

21. The system of claim 19, wherein the control means 
10 determines a division of the directory information into 

directory i>ackets each including plural units of directory 
information, and determines a division of the information 
stream into sets of plural information stream packets, 
each set of plural information stream packets including a 

15 numlber of access points equal to the units of directory 
information; axid 

the multiplexing means xmxltiplexes each directory 
packet adjacent the set of information stream packets 
including the access points whereto the directory 

20 information in the directory packet relates. 

22. The system of claim 16, wherein: 

the bit stream provided by the multiplexing means 
has plxiral layers; and 

the multiplexing means arranges the time-divided 
25 portions of the information stream in a first layer of the 
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bit stream and axranges the non-coonpressed auxiliary 
Information in a second layer of the bit stream, different 
from the first layer. 

23* The system of claim 22, wherein the information 
5 stream cooqprises plural access units and the auxiliary 
information is a set of time stamps for decoding the 
access units of the information stream. 
24. The system of claim 23, wherein: 

the auxiliary information buffer has an occupancy 
10 determined by the size of the auxiliary information 
buffer, the auxiliary information fed from the 
demultiplexer, and the auxiliary information removed by 
the auxiliary information processor; 
the control means is for: 
15 determining a division of the information stream 

into plural information x>ac)cets and providing an 
information packet header for each information packet, 

determining a division of the set of time stamps 
into time stamps; 
20 secpientially arranging the information stream 

packets and the auxiliary information portions, time 
sta2iq;>s are periodically included in the information packet 
header of the information packets at a time stamp buffer 
frequency; and 

25 controlling at least one of the time stamp coding 
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£reauency and tlie size o£ tlie auxiliary information buffer 
in such a manner tliat maximizes the occupancy of the 
information stream buffer without causing the information 
stream buffer to overflow. 
5 25. A method of deriving a bit stream from an 

information signal, the method comprising the steps of: 

con^resslng units of the information signal to 
provide units of an information stream, the units of the 
information stream including access points; 

10 deriving from the information stream pointers 

pointing the access points in the information stream; and 

multiplexing the information stream divided into 
information packets together with pointer packets to 
provide the bit stream such that a set of information 

15 packets containing plural consecutive access points is 
multiplexed adjacent a pointer x>acket containing the 
pointers pointing only to the plural consecutive access 
points. 

26. The method of claim 25, wherein: 
20 the mxxltiplexing step multiplexes the information 

packets together with pointer packets containing duam^ 
pointers prior to the deriving step; and 

the method additionally con^rises the step of 
overwriting the dumnry pointers with the pointers derived 
25 in the deriving step, the pointers overwritten into eat:h 
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pointer packet being the pointers pointing to the plural 
consecutive access points immediately preceding the 
pointer packet in the bit stream « 

27 • A method of deriving a bit stream from an 
5 information signal, the method con^rising the steps of: 
providing an encoder including: means for 
coaz^ressing units of the information signal to provide 
units of an information stream, first buffer means, having 
a size, for buffering the units of the information stream, 
10 means for providing a time stas^ v^n the first buffer 

me a n s receives each access unit of the information stream, 

second buffer means, having a size, for buffering the 
time stamps, and multiplezJ^g means for multiplexing the 
information stream from the first buffer means and the 
15 time stamps from the secoxul buffer means to provide the 
bit stream; 

defining a hypothetical system target decoder, the 
hypothetical system target decoder including a 
demultiplexer means for demultiplexing the bit stream, a 
20 serial arrangement of an information stream buffer and an 
information stream decoder, a serial arrangement of a 
time stai^p buffer and a time stamp processor, each serial 
arrangement being connected to the demultiplexer; 

determining the size of the first buffer means ^^t^ 
25 the size of the second buffer meang by emulating decoding 
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Of the bit stream using the hypothetical system target 
decoder; and 

encoding the information signal using the encoder 
with the size of the first buffer means and the size of the 
5 second buffer means set to the respective sizes determined 
by the determining step. 

28. The method of claim 27, wherein: 

in the step of defining the system target decoder: 
the information stream buffer and the time stai^p 
10 buffer each have a size, and the information stream 

decoder decodes the information stream in response to time 
stamps removed from the time buffer the time stamp 
processor; and 

in the determining step, the size of the first 
15 buffer means and the size of the second buffer means are 
determined from. 

29. The method of claim 28, wherein: 

in the encoder, the multiplexing means 
periodically includes time stanqps in the bit stream at a 
20 time stas^ coding frequency; 

the information stream has a bit rate; and 
in the determining step, a buf fezring delay is 
derived from the time stan^ coding frequency and the bit 
rate, and the size of the information stream buffer and 
25 the size of the time stamp buffer are derived from the 
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buffering delay. 

30. A decoder for a bit stream obtained by 
ntultiplexing non- compressed auxiliary information with an 
information stream, the information stream being obtained 
by compressing fixed-size units of an information signal 
with a varying coawression ratio to provide varying-sized 
units of the information stream, the auxiliary information 
being for use in subsequently decoding the information 
stream, units of the auxiliary information corresponding 
to the units of the information signal, the decoder 

comprising: 

demultiplexing means for extracting the 
information stream and the auxiliary information from the 
bit stream; 

first input buffer means for receiving the 
auxiliary information from the demultiplexing means; 

means for removing a unit of the auxiliary 
information frcm the first input buffer means; 

second input buffer means for receiving the 
information stream from the demultiplexing means; and 
decoder means for removing one of the varying- 
sized units of the information stream from the second 
input buffer means and expanding the removed unit of the 
information stream in response to the unit of the 
auxiliary information to recover a fixed- size unit of the 
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infozmatlon signal. 

31. The decoder of claim 30, wherein the decoder means 
removes the one of the vearying sized units of the 
information stream from the second input buffer means at a 
5 time indicated by the unit of the auxiliary information. 
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